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Abstract We study an admissions control problem, where a queue 
with service rate 1— p receives incoming jobs at rate A e (l-p,l), 
and the decision maker is allowed to redirect away jobs up to a rate 
of p, with the objective of minimizing the time-average queue length. 

We show that the amount of information about the future has a 
significant impact on system performance, in the heavy-traffic regime. 
When the future is unknown, the optimal average queue length di- 
verges at rate ~ logj_ j—r, as A -*■ 1. In sharp contrast, when all 
i—p 

future arrival and service times are revealed beforehand, the optimal 
average queue length converges to a finite constant, (l—p)/p, as A -*■ 1. 
We further show that the finite limit of (l—p)/p can be achieved us- 
ing only a finite lookahead window starting from the current time 
frame, whose length scales as O (log jzt), as A -> 1. This leads to the 
conjecture of an interesting duality between queuing delay and the 
amount of information about the future. 
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1. Introduction. 

1.1. "Variable, but Predictable" . The notion of queues have been used 
extensively as a powerful abstraction in studying dynamic resource alloca- 
tion systems, where one aims to match demands that arrive over time with 
available resources, and a queue is used to store currently unprocessed de- 
mands. Two important ingredients often make the design and analysis of a 
queueing system difficult: the demands and resources can be both variable 
and unpredictable. Variability refers to the fact that the arrivals of demands 
or the availability of resources can be highly volatile and non-uniformly 
distributed across the time horizon. Unpredictability means that such non- 
uniformity "tomorrow" is unknown to the decision maker "today" , and she 
is obliged to make allocation decisions only based on the state of the system 
at the moment, and some statistical estimates of the future. 

While the world will remain volatile as we know it, in many cases, the 
amount of unpredictability about the future may be reduced thanks to fore- 
casting technologies and the increasing accessibility of data. For instance, 

1 . advance booking in the hotel and textile industries allows for accurate 
forecasting of demands ahead of time [12]; 

2. the availability of monitoring data enables traffic controllers to predict 
the traffic pattern around potential bottlenecks [4]; 

3. advance scheduling for elective surgeries could inform care providers 
several weeks before the intended appointment [11]. 

In all of these examples, future demands remain exogenous and variable, yet 
the decision maker is revealed with (some of) their realizations. 

Is there significant performance gain to be harnessed by "looking into the 
future"? In this paper we provide a largely affirmative answer, in the context 
of a class of admissions control problems. 

1.2. Admissions Control Viewed as Resource Allocation. We begin by 
informally describing our problem. Consider a single queue equipped with a 
server that runs at rate 1-p jobs per unit time, where p is a fixed constant 
in (0, 1), as depicted in Figure 1.1. The queue receives a stream of incoming 
jobs, arriving at rate A e (0, 1). If A > 1-p, the arrival rate is greater than the 
server's processing rate, and some form of admissions control is necessary in 
order to keep the system stable. In particular, upon its arrival to the system, 
a job will either be admitted to the queue, or redirected. In the latter case, 
the job does not join the queue, and, from the perspective of the queue, 
disappears from the system entirely. The goal of the decision maker is to 
minimize the average delay experienced by the admitted jobs, while obeying 
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Figure 1.1. An illustration of the admissions control problem, with a constraint on the a 
rate of redirection. 



the constraint that the average rate at which jobs are redirected does not 
exceeded p. 1 

One can think of our problem as that of resource allocation, where a de- 
cision maker tries to match incoming demands with two types of processing 
resources: a slow local resource that corresponds to the server, and a fast 
external resource that can process any job redirected to it almost instan- 
taneously. Both types of resources are constrained, in the sense that their 
capacities (1 — p and p, respectively) cannot not change over time, by phys- 
ical or contractual predispositions. The processing time of a job at the fast 
resource is negligible compared to that at the slow resource, as long as the 
rate of redirection to the fast resource stays below p in the long run. Under 
this interpretation, minimizing the average delay across all jobs is equiv- 
alent to minimizing the average delay across just the admitted jobs, since 
the jobs redirected to the fast resource can be thought of being processed 
immediately and experiencing no delay at all. 

For a more concrete example, consider a web service company that en- 
ters a long term contract with an external cloud computing provider for a 
fixed amount of computation resources (e.g., virtual machine instance time) 
over the contract period. 2 During the contract period, any incoming request 
can be either served by the in-house server (slow resource), or be redirected 
to the cloud (fast resource), and in the latter case, the job does not ex- 
perience congestion delay since the scalability of cloud allows for multiple 

1 Note that as A -» 1, the minimum rate of admitted jobs, X-p, approaches the server's 
capacity 1—p, and hence we will refer to the system's behavior when A -> 1 as the heavy- 
traffic regime. 

2 Example. As of September 2012, Microsoft's Windows Azure cloud services offer a 
6-month contract for $71.99 per month, where the client is entitled for up to 750 hours 
of virtual machine (VM) instance time each month, and any additional usage would be 
charged at a 25% higher rate. Due to the large scale of the Azure data warehouses, the 
speed of any single VM instance can be treated as roughly constant, and independent of 
the total number of instances that the client is running concurrently. 
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VM instance to be running in parallel (and potentially on different physi- 
cal machines). The decision maker's constraint is that the total amount of 
redirected jobs to the cloud must stay below the amount prescribed by the 
contract, which, in our case, translates into a maximum redirection rate over 
the contract period. Similar scenarios can also arise in other domains, where 
the slow versus fast resources could, for instance, take on the forms of: 

1. an in- house manufacturing facility, versus an external contractor; 

2. a slow toll booth on the freeway, versus a special lane that lets a car 
pass without paying the toll; 

3. hospital bed resources within a single department, versus a cross- 
departmental central bed pool. 

In a recent work [17], a mathematical model was proposed to study the 
benefits of resource pooling in large scale queueing systems, which is also 
closely connected to our problem. They consider a multi-server system where 
a fraction 1 - p of a total of N units of processing resources (e.g., CPUs) 
is distributed among a set of N local servers, each running at rate 1 - p, 
while the remaining fraction of p is being allocated in a centralized fashion, 
in the form of a central server that operates at rate pN (See Figure 5.1). 
It is not difficult to see, when N is large, the central server operates at a 
significantly faster speed than the local servers, so that a job processed at the 
central server experiences little or no delay. In fact, the admissions control 
problem studied in this paper is essentially the problem faced by one of the 
local servers, in the regime where N is large (Figure 5.2). This connection 
will be explored in greater detail in Section 5, where we discuss what the 
implications of our results in context of resource pooling systems. 

1.3. Overview of Main Contributions. We preview some of the main re- 
sults in this section. The formal statements will be given in Section 3. 

1.3.1. Summary of the Problem. We consider a continuous-time admis- 
sions control problem, depicted in Figure 1.1. The problem is characterized 
by three parameters: X,p, and w: 

1. Jobs arrives to the system at a rate of A jobs per unit time, with 
A e (0,1). The server operates at a rate of 1 — p jobs per unit time, 
withpe (0,1). 

2. The decision maker is allowed to decide whether an arriving job is 
admitted to the queue, or redirected away, with the goal of minimizing 
the time-average queue length 3 , and subject to the constraint that the 



3 By Little's Law, the average queue length is essentially the same as average delay, up 
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time-average rate of redirection does not exceed p jobs per unit time. 
3. The decision maker has access to information about the future, which 
takes the form of a lookahead window of length w € R + . In particular, 
at any time t, the times of arrivals and service availability within the 
interval [t,t + w] are revealed to the decision maker. We will consider 
the following cases of w. 

(a) w = 0, the online problem, where no future information is avail- 
able. 

(b) w = oo, the offline problem, where entire the future has been 
revealed. 

(c) < w < oo, where future is revealed only up to a finite lookahead 
window. 

Throughout, we will fix p e (0,1), and be primarily interested in the 
system's behavior in the heavy-traffic regime of A -> 1. 

1.3.2. Overview of Main Results. Our main contribution is to demon- 
strate that the performance of a redirection policy is highly sensitive to the 
amount of future information available, measured by the value of w. 

Fix p e (0,1), and let the arrival and service processes be Poisson. For 
the online problem (w = 0), we show the optimal time-average queue length, 
Cq P< , approaches infinity in the heavy-traffic regime, at the rate 

Cn Pt ~ log i , as A ->• 1. 

— 1-A 

In sharp contrast, the optimal average queue length among offline policies 
(w = oo), Cto , converges to a constant, 

, asA-l, 

P 

and this limit is achieved by a so-called No-Job-Left-Behind policy. Figure 
1.2 illustrates this difference in delay performance for a particular value of 
p. 

Finally, we show that the No-Job-Left-Behind policy for the offline prob- 
lem can be modified, so that the same optimal heavy-traffic limit of — - is 
achieved even with a finite lookahead window, w(X), where 



«,(A)=o(log^V asA^l. 



to a constant factor. See Section 2.5. 



QUEUEING WITH FUTURE INFORMATION 



7 



70 r 
60- 
50- 



■ Offline (^ N0B ) 

■ Online (jt t L h ) 



K 40- 
<< 
a. 
O 30- 



20- 



10 



0°97 0.973 0.976 0.979 0.982 0.984 0.987 0.99 0.993 0.996 0.999 

Traffic Intensity (k) 

Figure 1.2. Comparison of optimal heavy-traffic delay scalings between online and offline 
policies, with p = 0.1 and A -> 1. The value C(p, A,7r) is the resulting average queue length 
as a function of p, X, and a policy it. 



This is of practical important, because in any realistic application only a 
finite amount of future information can be obtained. 

On the methodological end, we use a sample-path-based framework to an- 
alyze the performance of the offline and finite-lookahead policies, borrowing 
tools from renewal theory and the theory of random walks. We believe that 
our techniques could be substantially generalized to incorporate general ar- 
rival and service processes, diffusion approximations, as well as observational 
noises. See Section 9 for a more elaborate discussion. 

1.4. Related Work. There is an extensive body of literature devoted to 
various Markov (or online) admissions control problems; the reader is re- 
ferred to the survey of [5], and references therein. Typically, the problem is 
formulated as an instance of a Markov decision problem (MDP), where the 
decision maker, by admitting or rejecting incoming jobs, seeks to maximize a 
long-term average objective consisting of rewards (e.g., throughput) minus 
costs (e.g., waiting time experienced by a customer). The case where the 
maximization is performed subject to a constraint on some average cost has 
also been studied, and it has been shown, for a family of reward and cost 
functions, that an optimal policy assumes a "threshold-like" form, where 
the decision maker redirects the next job only if the current queue length is 
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great or equal to L, with possible randomization if at level L-l, and always 
admits the job if below L-l (c.f., [2]). Indeed, our problem, where one tries 
to minimize average queue length (delay) subject to a lower-bound on the 
throughput (i.e., a maximum redirection rate), can be shown to belong to 
this category, and the online heavy-traffic scaling result is a straightforward 
extension following the MDP framework, albeit dealing with technicalities 
in extending the threshold characterization to an infinite state space, since 
we are interested in the regime of A -> 1. 

However, the resource allocation interpretation of our admissions control 
problem as that of matching jobs with fast and slow resources, and, in par- 
ticular, its connections to resource pooling in the many-server limit, seems 
to be largely unexplored. The difference in motivation perhaps explains why 

the optimal online heavy-traffic delay scaling of log^_ t^t that emerges by 

i-p 

fixing p and taking A -> 1 has not appeared in the literature, to the best our 
knowledge. 

In sharp contrast to our knowledge of the online problems, significantly 
less is known for settings in which information about the future is taken into 
consideration. In [6], the author considers a variant of the flow control prob- 
lem where the decision maker knows the job size of the arriving customer, as 
well as the arrival and time and job size of the next customer, with the goal 
of maximizing certain discounted or average reward. A characterization of 
an optimal stationary policy is derived under a standard semi-Markov deci- 
sion problem framework, since the lookahead is limited to the next arriving 
job. In [7], the authors consider a scheduling problem with one server and 
M parallel queues, motivated by applications in satellite systems where the 
link qualities between the server and the queues vary over time. The au- 
thors compare the throughput performance between several online policies 
with that of an offline policy, which has access to all future instances of link 
qualities. However, the offline policy takes the form of a Viterbi-like dynamic 
program, which, while being throughput-optimal by definition, provides lim- 
ited qualitative insight. 

One challenge that arises as one tries to move beyond the online setting is 
that policies with lookahead typically do not admit a clean Markov descrip- 
tion, and hence common techniques for analyzing Markov decision problems 
do not easily apply. To circumvent the obstacle, we will first relax our prob- 
lem to be fully offline, which turns out to be surprisingly amenable to analy- 
sis. We then use the insights from the optimal offline policy to construct an 
optimal policy with a finite look-ahead window, in a rather straightforward 
manner. 

In other application domains, the idea of exploiting future information or 
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predictions to improve decision making has been explored. Advance reser- 
vations (a form of future information) have been studied in lossy networks 
[8, 9] and, more recently, in revenue management [10]. Using simulations, 
[11] demonstrates that the use of a one- and two- week advance scheduling 
window for elective surgeries can improve the efficiency at the associated 
intensive care unit (ICU). The benefits of advanced booking program for 
supply chains have been shown in [12] in the form of reduced demand un- 
certainties. While similar in spirit, the motivations and dynamics in these 
models are very different from ours. 

Finally, our formulation of the slow an fast resources had been in part 
inspired by the literature of resource pooling systems, where one improves 
overall system performance by (partially) sharing individual resources in 
collective manner. The connection of our problem to a specific multi-server 
model proposed by [17] will be discussed in Section 5. For the general topic 
of resource pooling, interested readers are referred to [13, 14, 15, 16] and the 
references therein. 

1.5. Organization of the Paper. The rest of the paper is organized as 
follows. The mathematical model for our problem is described in Section 
2. Section 3 contains the statements of our main results, and introduces 
the No-Job-Left-Behind policy (ttnob), which will be a central object of 
study for this paper. Section 4 presents two alternative interpretations of 
the No-Job-Left-Behind policy (as a "stack" and "cave", respectively) that 
have important structural, as well as algorithmic, implications. Sections 6 
through 8 are devoted to the proofs for the results concerning the online, 
offline and finite-lookahead policies, respectively. Finally, Section 9 contains 
some concluding remarks and future directions. 

2. Model and Setup. 

2.1. Notation. We will denote by N, Z+, and K + , the set of natural num- 
bers, non-negative integers, and non-negative reals, respectively. Let /, g ■ 
M+ -*■ R+ be two functions. We will use the following asymptotic notation 
throughout: f(x) < g(x) if lim x _>i ^ < 1, f(x) > g(x) if lim^i ^ > 1; 

f(x) « g(x) if lim^i ^) = q, and f{x) » g(x) if lirn^i j^fi = oo. 

2.2. System Dynamics. An illustration of the system setup is given in 
Figure 1.1. The system consists of a single-server queue running in contin- 
uous time (t € R+), with an unbounded buffer that stores all unprocessed 
jobs. The queue is assumed to be empty at t = 0. 
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Jobs arrive to the system according to a Poisson process with rate A, 
A e (0, 1), so that the intervals between two adjacent arrivals are independent 
and exponentially distributed with mean j. We will denote by {A(t) : t e M + } 
the cumulative arrival process, where A(t) e Z + is the total number of ar- 
rivals to the system by time t. 

The processing of jobs by the server is modeled by a Poisson process of of 
rate 1 -p. When the service process receives a jump at time t, we say that 
a service token is generated. If the queue is not empty at time t, exactly 
one job "consumes" the service token and leaves the system immediately. 
Otherwise, the service token is "wasted" and has no impact on the future 
evolution of the system. 4 We will denote by {S(t) : t e M + } the cumulative 
token generation process, where S(t) e Z + is the total number of service 
tokens generated by time t. 

When A > 1 -p, in order to maintain the stability of the queue, a decision 
maker has the option of "redirecting" a job at the moment of its arrival. 
One redirected, a job effectively "disappears", and for this reason, we will 
use the word deletion as a synonymous term for redirection throughout the 
rest of the paper, because it is more intuitive to think of deleting a job in 
our subsequent sample-path analysis. Finally, the decision maker is allowed 
to delete up to a time-average rate of p. 

2.3. Initial Sample Path. Let {Q° (t) -t e IR + } be the continuous-time 
queue length process, where Q°(t) e Z + is the queue length at time t if no 
deletion is applied at any time. We say that an event occurs at time t, if 
there is either an arrival, or a generation of service token, at time t. Let T n , 
n e N, be the time of the nth event in the system. Denote by {Q° [n] : n e Z + } 
the embedded discrete-time process of {Q° (t)}, where Q° [n] is the length 

4 When the queue is non-empty, the generation of a token can be interpreted as the 
completion of a previous job, upon which the server is ready to fetch the next job. The 
time between two consecutive tokens corresponds to the service time. The waste of a token 
can be interpreted as the server starting to serve a "dummy job". Roughly speaking, the 
service token formulation, compared to that of a constant speed server processing jobs with 
exponentially distributed sizes, provides a performance upper-bound due to the inefficiency 
caused by dummy jobs, but has very similar performance in the heavy-traffic regime, in 
which the tokens are almost never wasted. Using such a point process to model services 
is not new, and the reader is referred to [17] and the references therein. 

It is, however, important to note a key assumption implicit in the service token formu- 
lation: the processing times are intrinsic to the server, and independent of the job being 
processed. For instance, the sequence of service times will not depend on the order in 
which the jobs in the queue are served, so long as the server remains busy throughout the 
period. This distinction is of little relevance for an M/M/l queue, but can be important 
in our case, where the redirection decisions may depend on the future. 
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of the queue sampled immediately after the nth event, 5 

Q°[n] = Q°(T n ~), neN. 

with the initial condition Q°[0] = 0. It is well-known that Q° is a random 
walk on Z + , such that for all x\,X2 £ Z + and n e Z + , 



>(Q°[n + l] = x 2 \Q°[n] =xi) = • 



if x\ > 0, and 



A+l-p ' 

IzE 

A+l-p' 



0. 



X2~X\ = 1, 
X2-Xl = -'. 

otherwise, 



(2.1) 



'(Q [7i + l]=x 2 |Q [?i] = xi) = 



A+l-p' 

1-P 
A+l-p' 

o, 



X2~Xl = 1, 

x 2 -x\ = 0, 
otherwise, 



(2.2) 



if x\ = 0. Note that, when A > 1 —p, the random walk Q° is transient. 

The process Q° contains all relevant information in the arrival and service 
processes, and will be the main object of study of this paper. We will refer 
to Q as the initial sample path throughout the paper, to distinguish it from 
sample paths obtained after deletions have been made. 

2.4. Deletion Policies. Since a deletion can only take place when there 
is an arrival, it suffices to define the locations of deletions with respect to 
the discrete-time process {Q°[n] : n e Z + }, and throughout, our analysis will 
focus on discrete-time queue length processes unless otherwise specified. Let 
$ (Q) be the locations of all arrivals in a discrete-time queue length process 
Q, i.e., 

$(Q) = {?ieN: Q[n]>Q[n-l]}, 

and for any M c Z + , define the counting process {I(M, n) ■ n e N} associated 
with M as 6 

I(M,n) = |{l,...,n}nM|. (2.3) 

Definition 1. (Feasible Deletion Sequence) The sequence M = 
{rrii} is said to be a feasible deletion sequence with respect to a discrete- 
time queue length process, Q° , if all of the following hold: 



The notation f(x-) denotes the right-limit of / at x ■ f(x-) = lim y i x f(y). In this 
particular context, the values of Q° [n] are well defined, since the sample paths of Poisson 
processes are right-continuous-with-left-limits (RCLL) almost surely. 
6 \X\ denotes the cardinality of X. 
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1. All elements in M are unique, so that at most one deletion occurs at 
any slot. 

2. Mc$(Q°), so that a deletion occurs only when there is an arrival. 
3. 

lim sup -I(M,n)<- — j- -, a.s., (2.4) 

n^oo n A + (1 -p) 

so that the time-average deletion rate is at most p. 
In general, M is also allowed to be a finite set. 

The denominator A + (1 -p) in Eq. (2.4) is due to the fact that the total 
rate of events in the system is A+ (1 —p)- Analogously, the deletion rate in 
continuous time is defined by 

r d = (A + 1 -p) ■ lim sup -I(M,n) . (2.5) 

n->oo Tl 

The impact of a deletion sequence to the evolution of the queue length 
process is formalized in the following definition. 

Definition 2. (Deletion Maps) Fix an initial queue length process 
[Q°[n] : n e N} and a corresponding feasible deletion sequence M = {mj. 

1. The point- wise deletion map Dp m) outputs the resulting pro- 
cess after a deletion is made to Q° in slot m. Let Q' = Z?p(<5°,m). 
Then 

n n>m, andQ°[t]>0,Vte{m,...,n}. 

Q [n\ - < (^-t>) 
I Q [n] , otherwise, 

2. The multi-point deletion map D (Q°,M) outputs the resulting pro- 
cess after all deletions in the set M are made to Q°. Define Q l recur- 
sively as Q i = Dp (Q^^i), Vi e N. Then, Q°° = D(Q ,M) is defined 
as the point-wise limit 

Q°°[n]= lim Q*[ti], VneZ + . (2.7) 

i->min{]M|,oo} 

The definition of the point-wise deletion map reflects the earlier assump- 
tion that the service time of a job only depends on the speed of the server at 
the moment, and is independent of the job's identity (See Section 2). Note 
also that the value of Q°° [n] depends only on the total number of deletions 



7 This is equal to the total rate of jumps in ^4(-) and S(-). 
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before n (Eq. (2.6)), which is at most n, and the limit in Eq. (2.7) is justified. 
Moreover, it is not difficult to see that the order in which the deletions are 
made has no impact on the resulting sample path, as stated in the lemma 
below. The proof is omitted. 

Lemma 1. Fix an initial sample path Q , and let M and M be two fea- 
sible deletion sequences that contain the same elements. Then D(Q ,M) = 
D(Q°,M). 

We next define the notion of a deletion policy, which outputs a deletion 
sequence based on the (limited) knowledge of an initial sample path Q°. 
Informally, a deletion policy is said to be w-lookahead if it makes its deletion 
decisions based on the knowledge of Q° up to w units of time into the future 
(in continuous time). 

Definitions. (u>-Lookahead Deletion Policies) Fix w € R + u{oo}. 
Let Tt = o (Q°(s); s < be the natural filtration induced by {Q°(t) ■ t e 
and Too = u teZ+3~t- A w-predictive deletion policy is a mapping, tt : Z + + -» 
N°°, such that 

1. M = 7r(Q ) is a feasible deletion sequence a.s.; 

2. {n € M} is J~T n +w measurable, for all n e N. 

We will denote by ILj, the family of all w-lookahead deletion policies. 

The parameter w in Definition 3 captures the amount of information that 
the deletion policy has about the future. 

1. When w = 0, all deletion decisions are made solely based on the knowl- 
edge of the system up till the current time frame. We will refer to Flo 
as online policies. 

2. When w = oo, the entire sample path of Q° is revealed to the decision 
maker at t = 0. We will refer to as offline policies. 

3. We will refer to 11^,0 < w < oo, as policies with a lookahead window of 
size w. 

2.5. Performance Measure. Given a discrete-time queue length process 
Q and n € N, denote by S (Q, n) e Z + the partial sum 



n 

S(Q,n) = Y l Q[k] 
k=l 



(2.8) 
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Definition 4. (Average Post-deletion Queue Length) Let Q be 

an initial queue length process. Define C(p, X, tt) e M + as the expected average 
queue length after applying a deletion policy tt: 

C(p,X,ir) = E(limsup ^5 (Q~,n)), (2.9) 

where = D (Q°, tt (Q )) , o,nd the expectation is taken over all realizations 
ofQ°, and the randomness used by tt internally, if any. 

Remark: Delay versus Queue Length. By Little's Law, the long-term av- 
erage waiting time of a typical customer in the queue is equal to the long- 
term average queue length divided by the arrival rate (independent of the 
service discipline of the server). Therefore, if our goal is to minimize the av- 
erage waiting time of the jobs that remain after deletions, it suffices to use 
C(p, A,7r) as a performance metric in order to judge the effectiveness of a 
deletion policy tt. In particular, denote by T a u e M + the time-average queue- 
ing delay experienced by all jobs, where deleted jobs are assumed to have a 
delay of zero, then E(T a ^) = jC(p, A, tt), and hence the average queue length 
and delay coincide in the heavy-traffic regime, as A -*■ 1. With an identical 
argument, it is easy to see that the average delay among admitted jobs, T a( i t , 
satisfies E,(T a dt) = j^C(p, A, tt), where is the continuous-time deletion 
rate under tt. Therefore, we may use the terms "delay" and "average queue 
length" interchangeably in the rest of the paper, with the understanding 
that they represent essentially the same quantity up to a constant. 

Finally, we define the notion of an optimal delay within a family of policies. 

Definition 5. (Optimal Delay) Fix w € R + . We call (p,X) the 
optimal delay in Tl w , where 

C^ w (p,X)= inf C( P ,X,tt). (2.10) 

3. Summary of Main Results. We state the main results of this 
paper in this section, whose proofs will be presented in Sections 6 through 
8. 

3.1. Optimal Delay for Online Policies. 

Definition 6. (Threshold Policies) We say that TT^ h is an L-threshold 
policy, if a job arriving at time t is deleted if and only if the queue length at 
time t is greater or equal to L. 
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The following theorem shows that the class of threshold policies achieves 
the optimal heavy-traffic delay scaling in IIo. 

Theorem 7. (Optimal Online Policies) Fix p e (0,1) , and let 

L(p,X) P 

Then, 



log^ 

i-p 1 - A 



1. 7rf», is feasible for all A 6 (1 — p, 1). 

7rji is asymptotically optimal in IIo as A -»■ 1: 

C (p, A, vrf h (p ' A) ) ~ CA (p, A) ~ log^ J-^, as A - 1. 

Proof. See Section 6. □ 

3.2. Optimal Delay for Offline Policies. Given the sample path of a ran- 
dom walk Q, let U (Q, n) the number of slots till Q reaches the level Q[n] - 1 
after slot n: 

U (Q, n) = inf {j>l:Q[n + j] = Q[n] - 1} . (3.1) 

Definition 8. (No-Job-Left-Behind Policif') Given an initial sam- 
ple path Q°, the No-Job-Left-Behind policy, denoted by ttnob, deletes all 
arrivals in the set ^ , where 

^ = {ne$(Q°):[/(Q°,n) = oo}. (3.2) 

We will refer to the deletion sequence generated by it nob as = [mf : i e N}, 
where M* = ^ . 

In other words, ttnob would delete a job arriving at time t if and only if 
the initial queue length process never returns to below the current level in 
the future, which also implies that 

Q°[n]>Q°[mf], WeN,n>mf, (3.3) 

Examples of the ttnob policy being applied to a particular sample path is 
given in Figures 3.1 and 3.2 (illustration), as well as in Figure 3.3 (simula- 
tion). 

It turns out that the delay performance of it nob is about as good as we 
can hope for in heavy traffic, as is formalized in the next theorem. 



The reason for choosing this name will be made in clear in Section 4.1, using the 
"stack" interpretation of this policy. 
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Figure 3.1. Illustration of applying knob to an initial sample path, Q , where the dele- 
tions are marked by bold red arrows. 

Q[n] 1 



Q[0] 
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Figure 3.2. The solid lines depict the resulting sample path, Q = D(Q ,M*), after 
applying tvnob to Q°. 



Theorem 9. (Optimal Offline Policies) Fix p e (0,1). 
1- t^nob ^ feasible for all A e (1 -p, 1), and 9 

C (p,\,^nob) = t — \~ — r- 
A-(l-p) 

-2- ^ no B is asymptotically optimal in Ii^ as A -»• 1: 

lim C7 (p, A, tttvob) = lim Cj^ (p, A) = — -. 

A— >1 A— >1 p 



(3.4) 



It is easy to see that -knob is not a very efficient deletion policy for relative small 
values of A. In fact, C (p, X,t:nob) is a decreasing function of A. This problem can be 
fixed by injecting into the arrival process an Poisson process of "dummy jobs" of rate 
1 - A - e, so that the total rate of arrival is 1 - e, where e « 0. This reasoning implies that 
(1 -p)/p is a uniform upper-bound of Cn^(p,A) for all A e (0,1). 
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Proof. See Section 7. □ 

Remark 1. Heavy-traffic "Delay Collapse". It is perhaps surprising to 
observe that the heavy-traffic scaling essentially "collapses" under itnob'- the 
average queue length converges to a finite value, ^—2, as A -*■ 1, which is in 

sharp contrast with the optimal scaling of ~ log^_ t^t for the online policies, 

i-p 

given by Theorem 7 (See Figure 1.2 for an illustration of this difference). A 
"cave" interpretation of the No-Job-Left-Behind policy, to be introduced in 
Section 4.2, will help us understand intuitively why such a drastic discrep- 
ancy exists between the online and offline heavy-traffic scaling behaviors. 
See discussion in Section 4.2.1. 

Also, as a by-product of Theorem 9, observe that the heavy-traffic limit 
scales, in p, as 

This is consistent with an intuitive notion of "flexibility": delay should de- 
generate as the system's ability to redirect away jobs diminishes. 

Remark 2. Connections to Branching Processes and Erdos-Renyi Ran- 
dom Graphs. Let d < 1 < c satisfy de~ d = ce~ c . Consider a Galton- Watson 
birth process in which each node has Z children, where Z is Poisson with 
mean c. Conditioning on the finiteness of the process gives a Galton- Watson 
process where Z is Poisson with mean d. This occurs in the classical analy- 
sis of the Erdos-Renyi random graph G(n,p) with p = c/n. There will be a 
giant component and the deletion of that component gives a random graph 
G(m,q) with q = d/m. As a rough analogy, ttnob deletes those nodes that 
would be in the giant component. 



3.3. Policies with a Finite Lookahead Window. In practice, infinite pre- 
diction into the future is certainly too much to ask for. In this section, 
we show that a natural modification of it nob allows for the same delay 
to be achieved, using only a finite lookahead window, whose length, w(X), 
increases to infinity as A -*■ 1. 10 

Denote by w e M + the size of the lookahead window in continuous time, 
and W(n) e Z + the window size in the discrete-time embedded process Q°, 

10 In a way, this is not entirely surprising, since the ttnob leads to a deletion rate of 
A - (1 -p), and there is an additional p-[X-(l—p)] = 1 - A unused deletion rate that can 
be exploited. 
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Figure 3.3. Example sample paths of Q and those obtained after applying 7r th and 
ttjvos to Q° , with p = 0.05 and X = 0.999. 

starting from slot n. Letting T n be the time of the nth event in the system, 
then 

W(n) = sup {k€Z + : T n+k <T n + w}. (3.6) 
For xeN, define the set of indices 

U (Q, n, x) = inf {j e {1, . . . , x} : Q [n + j] = Q[n] - 1} . (3.7) 

Definition 10. (w-No-Job-Left-Behind Policy) Given an initial 
sample path Q° and w > 0, the w- No- Job- Left- Behind policy, denoted by 
n NOB' deletes all arrivals in the set ^f w , where 

tf w = {n € $ (Q°) : U (Q°, n, W(n)) = 00} . 

It is easy to see that k^ob ^ s si m ply ^no B applied within the confinement 
of a finite window: a job at t is deleted if and only if the initial queue length 
process does not return to below the current level within the next w units of 
time, assuming no further deletions are made. Since the window is finite, it is 
clear that ^ w d * for any w < 00, and hence C (p, A, 7t]vob) - ^ (p> ^> ^nob) 
for all A € (1 -p). The only issue now becomes that of feasibility: by making 
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decision only based on a finite lookahead window, we may end up deleting 
at a rate greater than p. 

The following theorem summarizes the above observations, and gives an 
upper bound on the appropriate window size, w, as a function of A. 11 

Theorem 11. (Optimal Delay Scaling with Finite Lookahead) 

Fix p e (0, 1). There exists C > 0, such that if 

w(\) = C -log- 



1-A 

then vr^^p is feasible, and 



C (p,\ 7! NOb) ^C(p,\,1T N ob) = x _^ P _ p y ( 3 - 8 ) 

Since (p,X) > (p,X) and C^^faX) < C\p, K^ob)> we also 

have that 

lim C n (p, A) = lim CfV (p, A) = — -. (3.9) 
a-i u »o) VF ' ' a^i ilo ° ' p 

PROOF. See Section 8.1. □ 

3.3.1. Delay-Information Duality. Theorem 11 says that one can attain 
the same heavy-traffic delay performance as the the optimal offline algo- 
rithm, if the size of the lookahead window scales as C(log^^). Is this the 
minimum amount of future information necessary to achieve the same (or 
comparable) heavy-traffic delay limit as the optimal offline policy? We con- 
jecture that this is the case, in the sense that thee exists a matching lower- 
bound, as follows. 

Conjecture 1. Fix pe (0,1). // w(X) « log t^t as A -> 1, then 

limsupC7n m(A) (p,A) = oo. 

A > 1 

In other words, "delay collapse" can occur only if w(\) = (log 

If the conjecture is proven, it would imply a sharp transition in the sys- 
tem's heavy-traffic delay scaling behavior, around the critical "threshold" of 
w(X) - (log jzr)- It would also imply the existence of a symmetric dual 
relationship between future information and queueing delay: ©(logy^) 
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Figure 3.4. "Delay v.s Information. " Best achievable heavy traffic delay scaling as a func- 
tion of the size of the lookahead window, w. Results presented in this paper are illustrated 
in the solid lines and circles, and the gray dotted line depicts our conjecture of the unknown 
regime of < w(X) < log (ttt). 



amount of information is required to achieve a finite delay limit, and one 
has to suffer (log ■jrr) in delay, if only finite amount of future information 
is available. 

Figure 3.4 summarizes the main results of this paper from the angle of 
the delay-information duality. The dotted line segment marks the unknown 
regime, and the sharp transition at its right end point reflects the view of 
Conjecture 1. 

4. Interpretations of knob- We present two equivalent ways of de- 
scribing the No-Job-Left-Behind policy knob- While the interpretations 
may be interesting in their own right, they also provide us with operational 
insights into the dynamics of the policy. In particular, the stack interpreta- 
tion helps us derive asymptotic deletion rate of it nob hi a simple manner, 
and the cave interpretation, which takes a time-reversal point of view, shows 
us that the set of deletions made by ttnob can be calculated efficiently in 
linear time (with respect to the length of the time horizon). 

4.1. Stack Interpretation. Suppose that the service discipline adopted by 
the server is that of last-in-first-out (LIFO), where the it always fetches a 
task that has arrived the latest. In other words, the queue works as a stack. 
Suppose that we first simulate the stack without any deletion. It is easy 
to see that, when the arrival rate A is greater than the service rate 1 - p, 



1 Note that Theorem 11 implies Theorem 9 and is hence stronger. 
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there will be a growing set of jobs at the bottom of the stack that will never 
be processed. Label all such jobs as "left-behind" . For example, Figure 3.1 
shows the evolution of the queue over time, where all "left-behind" jobs are 
colored with a blue shade. One can then verify that the policy it nob given in 
Definition 8 is equivalent to deleting all jobs that are labeled "left-behind" , 
hence the namesake "No- Job-Left-Behind" . Figure 3.2 illustrates applying 
knob to a sample path of Q°, where the ith job to be deleted is precisely the 
ith job among all jobs that would have never been processed by the server 
under a LIFO policy. 

One advantage of the stack interpretation is that it makes obvious the 
fact that the deletion rate induced by it nob is equal to A - (1 - p) < p, as 
illustrated in the following lemma. 

Lemma 2. For all A > 1 — p, the following statements hold. 

1. With probability one, there exists T < oo, such that every service token 
generated after time T is matched with some job. In other words, the 
server never idles after some finite time. 

2. LetQ = D(Q°,M' i '). We have 

Urn sup -I (M*,n)< X ~( 1 ~p\ a . s 
n^oo n X+l-p 

which implies that it nob is feasible for all p e (0,1) and A e (1 -p, 1). 
Proof. See Appendix A.l □ 

4.2. Cave Interpretation. We now view the sample path of : n e N] 

as the wall of a cave: the x axis is the "floor" , the area above Q° the "rock" , 
and the cave opens up towards right. Now, suppose there is a light source 
placed at n = oo, emitting parallel beams of light (illustrated by the blue 
shades in Figure 3.1) into the cave from the right. By Definition 8, it is easy 
to see that the deletions made by it nob are precisely the areas on the wall 
that are "lit" by this light source. 

The cave interpretation shows that the deletions made by it nob are ar- 
guably more natural when viewing the process Q° in reverse. It may be 
counter-intuitive that the notion of time should matter, since the problem 
is "offline" after all. However, as we will see in the next section, the time- 
reverse view leads naturally to an algorithm of computing M over a finite 
time horizon n e {1, . . . , N}, whose running time scales linearly with respect 
to N as N -> oo, and is very simple to describe. 
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4.2.1. "Anticipation" v.s "Reaction". A key benefit of the cave interpre- 
tation is that it demonstrates the power of knob's being highly anticipatory, 
in a geometrically intuitive manner. Looking at Figure 3.1, one sees immedi- 
ately that the wall areas "under light" correspond to all the segments where 
the initial sample path Q° are taking a consecutive "upward hike" . In other 
words, the policy ttnob begins to delete jobs precisely when it anticipates 
that the arrivals are just about to get intense. Similarly, a wall area will be 
"in the shade" only if the wall curves down eventually in the future, which 
corresponds knob's stopping deleting jobs as soon as it anticipates that 
the next few arrivals can be handled by the server alone. In sharp contrast 
is the nature of the optimal online policy, tt^ p ' X \ which is by definition 
"reactionary" and begins to delete only when the current queue length has 
already reached a high level. The differences in the resulting sample paths 
are illustrated via simulations in Figure 3.3. For example, as Q° continues to 
increase during the first 1000 time slots, ttnob begins deleting immediately 
after t - 0, while no deletion is made by tt^ p '^ during this period. 

To summarize this comparison with a rough analogy, the offline policy 
starts to delete before the arrivals get busy, but the online policy can only 
delete after the burst in arrival traffic has been realized, by which point it is 
already "too late" to fully contain the delay. This explains, to certain extend, 
why ttnob is capable of achieving "delay collapse" in the heavy-traffic regime 
(i.e., a finite limit of delay as A -»• 1, Theorem 9), while the delay under even 
the best online policy diverges to infinity as A -> 1 (Theorem 7). 

4.2.2. A Linear-time Algorithm for ttnob- While the offline deletion 
problem serves as a nice abstraction, it is impossible to actually store in- 
formation about the infinite future in practice, even if such information is 
available. A natural finite-horizon version of the offline deletion problem can 
be posed as follows: given the values of Q° over the first ./V slots, where ./V 
finite, one would like to compute the set of deletions made by ttnob- 

M*=M 9 n{l,...,N}, 

assuming that > Q°[-^] f° r au n > N. Note that this problem also 

arises in computing the sites of deletions for the tt nob policy, where one 
would replace ./V with the length of the lookahead window, w. 

We have the following algorithm, which identifies all slots on which a new 
"minimum" is achieved in Q°, when viewed in the reverse order of time. 
Note that these are precisely the slots "under light" according to the cave 
interpretation (Section 4.2). 

A Linear-time Algorithm for ttnob 
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S+-Q°[N], and M* «- 
for n = N down to 1 do 
if Q°[n] < S then 
M* - Af* u {„ + 1} 

else 

end if 
end for 
return Af* 



It is easy to see that the running time of the above algorithm scales 
linearly with respect to the length of the time horizon, N. Note that this 
is not the unique linear-time algorithm. In fact, one can verify that the 
simulation procedure used in describing the stack interpretation of ttnob 
(Section 4), which keeps track of which jobs would eventually be served, is 
itself a linear-time algorithm. However, the time-reverse version given here 
is arguably more intuitive and simpler to describe. 



5. Applications to Resource Pooling. We discuss in this section 
some of the implications of our results in the context of a multi-server model 
for resource pooling [17], illustrated in Figure 5.1, which has partially moti- 
vated our initial inquiry. 

We briefly review the model in [17] below, and the reader is referred to 
the original paper for a more rigorous description. Fix a coefficient p e [0, 1]. 
The system consists of TV" stations, each of which receives an arrival stream 
of jobs at rate A e (0, 1) and has one queue to store the unprocessed jobs. 
The system has a total amount of processing capacity of N jobs per unit 
time, and is divided between two types of servers. Each queue is equipped 
with a local server of rate 1-p, which is capable of serving only the jobs 
directed to the respective station. All stations share a central server of rate 
pN, which always fetches a job from the most loaded station, following a 
Longest-Queue-First (LQF) scheduling policy. In other words, a fraction p of 
the total processing resources is being pooled in a centralized fashion, while 
the remainder is distributed across individual stations. All arrival and service 
token generation processes are assumed to be Poisson and independent from 
one another (same as in Section 2). 

A main result of [17] is that even a small amount of resource pooling 
(small but positive p) can have significant benefits over a fully distributed 
system {p = 0). In particular, for any p > 0, and in the limit as the system size 
N -> oo, the average delay across the whole system scales as ~ log^_ j—r, 
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as A -> 1 (note this is the same scaling as in Theorem 7). This is an expo- 
nential improvement over the scaling of ~ j^j when no resource pooling is 
implemented, i.e., p = 0. 



Station 1 



Station 2 



□ □ 




}'Pj 



Station N 



□ □□H 





Figure 5.1. Illustration of a model for resource pooling with distributed and centralized 
resources, [17]. 




Figure 5.2. Resource pooling using a central queue. 

We next explain how our problem is intimately connected to the resource 
pooling model described above, and how the current paper suggests that the 
results in [17] can be extended along several directions. Consider a similar 
iV-station system as in [17], with the only difference being that instead of the 
central server fetching jobs from the local stations, the central server simply 
fetches jobs from a "central queue", which stores jobs redirected from the 
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local stations (See Figure 5.2. Denote by {Ri(t) : t e i e {1, . . . , iV}, the 
counting process where -Rj(t) is the cumulative number of jobs redirected to 
the central queue from station i by time t. Assume that limsup t ^ 00 jRi(t) = 
p - e almost surely for all i e {1, . . . , N}, for some e > 0. 12 

From the perspective of the central queue, it receives an arrival stream 
R N , created by merging N redirection streams, R N (t) = £ i=1 i?j(i). The 
process R N is of rate (p-e)N, and it is served by a service token generation 
process of rate pN. The traffic intensity of the of central queue (arrival rate 
divided by service rate) is therefore p c -{p~ e)N/pN = 1 - e/p < 1. Denote 
by Q N e Z + the length of the central queue in steady-state. Suppose that it 
can be shown that 13 



A key consequence of Eq. (5.1) is that, for large values of N, Q becomes 
negligible in the calculation of the system's average queue length: the aver- 
age queue length across the whole system coincides with the average queue 
length among the local stations, as N -> oo. In particular, this implies that, 
in the limit of N -> oo, the task of scheduling for the resource pooling sys- 
tem could alternatively be implemented by running a separate admissions 
control mechanism, with the rate of redirection equal to p - e, where all 
redirected jobs are sent to the central queue, granted that the streams of 
redirected jobs (Ri(t)) are sufficiently well-behaved so that Eq. (5.1) holds. 
This is essentially the justification for the equivalence between the resource 
pooling and admissions control problems, discussed at the beginning of this 
paper (Section 1.2). 

With this connection in mind, several implications follows readily from 
the results in the current paper, two of which are given below 

1. The original LQF scheduling policy employed by the central server in 
[17] is centralized: each fetching decision of the central server requires 
the full knowledge of the queue lengths at all local stations. How- 
ever, Theorem 7 suggests that the same system-wide delay scaling in 
the resource pooling scenario could also be achieved by a distributed 
implementation: each server simply runs the same threshold policy, 

12 Since the central server runs at rate pN, the rate of Ri(t) cannot exceed p, assuming 
it is the same across all i. 

13 For an example where this is true, assume that every local station adopts a random- 
ized rule and redirects an incoming job to the central queue with probability ^ (and 
that A is sufficiently close to I so that- 2 ^ e (0,1)). Then Ri(t) is a Poisson process, and 
by the merging property of Poisson processes, so is i?jv(t). This implies that the central 
queue is essentially an M/M/l queue with traffic intensity p c = (p-e)/p, and we have 





that E (Q N ) = for all N. 
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7r t/i ' , and routes all deleted jobs to the central queue. To prove 
this rigorously, one needs to establish the validity of Eq. (5.1), which 
we will leave as future work. 
2. A fairly tedious stochastic coupling argument was employed in [17] to 

establish a matching lower bound for the ~ log^_ -r-y delay scaling, 

i-p 1 

by showing that the performance of the LQF policy is no worse than 
any other online policy. Instead of using stochastic coupling, the lower 
bound in Theorem 7 immediately implies a lower bound for the re- 
source pooling problem in the limit of N -*■ oo, if one assumes that 
the central server adopts a symmetric scheduling policy, where the it 
does not distinguish between two local stations beyond their queue 
lengths. 14 To see this, note that the rate of Ri(t) are identical under 
any symmetric scheduling policy, which implies that it must be less 
than p for all i. Therefore, the lower bound derived for the admissions 
control problem on a single queue with a redirection rate of p automat- 
ically carries over to the resource pooling problem. Note that, unlike 
the previous item, this lower bound does not rely on the validity of 
Eq. (5.1). 

Both observations above exploit the equivalence of the two problems in 
the regime of iV -> oo. With the same insight, one could also potentially 
generalize the delay scaling results in [17] to scenarios where the arrival 
rates to the local stations are non-uniform, or where future information is 
available. Both extensions seem difficult to accomplish using the original 
framework of [17], which is based on a fluid model that heavily exploits 
the symmetry in the system. On the downside, however, the results in this 
paper tell us very little when system size ./V is small, in which case it is 
highly conceivable that a centralized scheduling rule, such as the Longest- 
Queue-First policy, can out-perform a collection of decentralized admissions 
control rules. 

6. Optimal Online Policies. Starting from this section and through 
Section 8, we present the proofs of the results stated in Section 3. 

We begin with showing Theorem 7, by formulating the online problem as 
a Markov decision problem (MDP) with an average cost constraint, which 
then enables us to use existing results to characterize the form of optimal 
policies. Once the family of threshold policies has been shown to achieve 
the optimal delay scaling in IIo under heavy-traffic, the exact form of the 



This is a natural family of policies to study, since all local servers, with the same 
arrival and service rate, are indeed identical. 
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scaling can be obtained in a fairly straightforward manner from the steady- 
state distribution of a truncated birth-death process. 

6.1. A Markov Decision Problem Formulation. Since both the arrival 
and service processes are Poisson, we can formulate the problem of finding 
an optimal policy in Ho as a continuous-time Markov decision problem with 
an average-cost constraint, as follows. Let {Q(t) :ieIR + } be the resulting 
continuous-time queue length process after applying some policy in Ho to 
Q°. Let T k be the fcth upward jump in Q and r k the length of the fcth inter- 
jump interval, r k = T k - T^-i- The task of a deletion policy, tt e Ho, amounts 
to choosing, for each of the inter-jump interval, a deletion action, a k e [0, 1], 
where the value of a k corresponds to the probability that the next arrival 
during the current inter-jump interval will be deleted. Define R and K to 
be the reward and cost functions of an inter-jump interval, respectively, 

R(Qk,a k ,T k ) = -Q k - T fc , (6.1) 
K(Q k , a k , r fc ) = A(l - a k )r k , (6.2) 

where Q k = Q(T k ). The corresponding MDP seeks to maximize the time- 
average reward 

^ = , iminf M%5^M (6 .3) 
while obeying the average-cost constraint 

C n = hm sup — < p. (6.4) 

n^oo JB^tt {2-, k =\ T k) 

To see why this MDP solves our deletion problem, observe that R n is the 
negative of the time-average queue length, and G\- is the time-average dele- 
tion rate. 

It is well known that the type of constrained MDP described above admits 
an optimal policy that is stationary [1], which means that the action a k 
depends solely on current state, Q k , and is independent of the time index k. 
Therefore, it suffices to describe it using a sequence, {b q ■ q e Z + }, such that 
a k = b q whenever Q k = q. Moreover, when the state space is finite 15 , stronger 
characterizations of the b q 's have been obtained for a family of reward and 
cost functions under certain regularity assumptions (Hypotheses 2.7, 3.1 and 
4.1 in [2]), which ours do satisfy (Eqs. (6.1) and (6.2)). Theorem 7 will be 
proved using the next known result (adapted from Theorem 4.4 in [2]): 



15 This corresponds to a finite buffer size in our problem, where one can assume that 
the next arrival is automatically deleted when the buffer is full, independent of the value 
of ak- 
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Lemma 3. Fix p and A, and let the buffer size B be finite. There exists 
an optimal stationary policy, {bt}, of the form 

'l, q<L*-l, 
K = U, q = L*-l, 
0, q>L\ 

for some L* e Z + and £ e [0, 1] . 
6.2. Proof of Theorem 7. 

Proof. (Theorem 7) In words, Lemma 3 states that the optimal policy 
admits a "quasi-threshold" form: it deletes the next arrival when Q(t) > L* , 
admits when Q{t) < L* -1, and admits with probability £ when Q{t) = L*-l. 
Suppose, for the moment, that the statements of Lemma 3 also hold when 
the buffer size is infinite, an assumption to be justified by the end of the 
proof. Denoting by it* the stationary optimal policy associated with {b*}, 
when the constraint on the average of deletion is p (Eq. (6.4)). The evolution 
of Q(t) under ir* is that of a birth-death process truncated at state L* , 
with the transition rates given in Figure 6.1, and the time-average queue 
length is equal to the expected queue length in steady state. Using standard 
calculations involving the steady-state distribution of the induced Markov 
process, it is not difficult to verify that 

C(p, A, vr^- 1 ) < C(p, A, vr; ) < C(p, A, tt£ ), (6.5) 

where L* is defined as in Lemma 3, and C(p, A, it) is the time-average queue 
length under policy n, defined in Eq. (2.9). 




\—p \-p i—p \-p 

Figure 6.1. The truncated birth-death process induced by -k* v . 

Denote by : i e N} the steady-state probability of the queue length 
being equal to i, under a threshold policy 7r^ h . Assuming A + 1 -p, standard 
calculations using the balancing equations yield 

/ . . \ 
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and \i\ = for all i > L + 1. The time-average queue length is given by 



8=1 



- 1) _ i) ' K 2 " " !) - 1) + - 1) ^] , (6-7) 



where = — . Note that when A > 1 -p, [if is decreasing with respect to 
L for all i e {0,1,..., L} (Eq. (6.6)), which implies that the time-average 
queue length is monotonically increasing in L, i.e., 

C(p,X,4 h +1 )-C(p,X,4h) 

i=0 



=(l + i)- m £:i + l-(i-^ +i -i) 



=/i£:i>o. 



It is also easy to see that, whenever 9 > 1, 



L+l 



#L+1 _ x 



L, as L -> oo. 



(6i 



(6.9) 



Since deletions only occur when Q{t) is in state L, from Eq. (6.6), the 
average rate of deletions in continuous time under 7r^ is given by, 



r d (p, A, vr t L h , ) = A • tt l = A • ( YZ - ) 



/ 



i-p 



Define 



L(x, A) = min {LeZ + : r d (p, A, ir^ h , ) < x} 



(6.10) 



(6.11) 



that is, L(x, A) is the smallest L for which 7r^ remains feasible, given an 
deletion rate constraint of x. Using Eqs. (6.10) and (6.11) to solve for L(p, A), 
we obtain, after some algebra, 
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and, by combining Eq. (6.12) and Eq. (6.9) with L = L(p,X), we have 



C(p,X,n% p ' X) )~L(p,X)~\og^-!—, asA-1. (6.13) 

i-p 1 - A 

L(J>,X) 



By Eqs. (6.8) and (6.11), we know that 7i\, ' achieves the minimum 
average queue length among all feasible threshold policies. By Eq. (6.5), we 
must have that 

C (p, A, 4 (P ' AM ) < C(p, A, n* p ) < C (p, A, 4 (p ' A) ) , (6.14) 

Since Lemma 3 only applies when B < oo, Eq. (6.14) holds whenever the 
buffer size, B, is greater than L{p,X) but finite. We next extend Eq. (6.14) 
to the case of B = oo. Denote by v* a stationary optimal policy, when B - oo 
and the constraint on average deletion rate is equal to p (Eq. (6.4)). The 
upper bound on C(p, A, tt*) in Eq. (6.14) automatically holds for C(p, A, v* ), 

since C(p, A, t^^ p '^) is still feasible when B = oo. It remains to show a lower 
bound of the form 

C(p,A,^)>c(p,A,^ (p ' A) - 2 ) (6.15) 

when B - oo, which, together with the upper bound, will have implied that 
the scaling of C(p, X,ir^ p ' X ^) (Eq. (6.13)) carries over to v*, 

C(p,X^;)-C(p,X,^ p ' X) )-log_ h _^-, asA-l, (6.16) 

thus proving Theorem 7. 

To show Eq. (6.15), we will use a straightforward truncation argument 
that relates the performance of an optimal policy under B = oo to the case 
of B < oo. Denote by {&*} the deletion probabilities of a stationary optimal 
policy, u*, and by {6* (£?')} the deletion probabilities for a truncated version, 
v*(B'), with 

b* q (B')=l( q <B')-b* q , 

for all q > 0. Since v* is optimal and yields the minimum average queue 
length, it is without loss of generality to assume that the Markov process 
for Q(t) induced by u* is positive recurrent. Denoting by {fJ-*} and {//*(£?')} 
the steady-state probability of queue length being equal to i under v * and 
v*(B'), respectively, it follows from the positive recurrence of Q{t) under 
Up and some algebra, that 

hm /4(B / )=Mi, (6-17) 
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for all i e Z+, and 

hrn^ C (p, A, i/; (SO) - C (p, A, v* p ) . (6.18) 
By Eq.(6.17) and the fact that b*{B') = b* for all < i < B', we have that 16 

oo 

Hm r d (p,\,v*(B')) = lim A ^]p*(50 ■ (l - b*(B')) = r d (p, A, i/*) < p. 

P ^ OO /~> — ^ OO ■ r\ 

(6.19) 

It is not difficult to verify, from the definition of L(p, A) (Eq. (6.11)), that 
lim L(p + 5,\)> L(p, A) - 1, 

for all p, A. For all 5 > 0, choose B' to be sufficiently large, so that 

C {p, A, v; (50) < C (p, A, i/;) + 5, (6.20) 
L ( A, r d {p, A, ^ (50) ) > L(p, A) - 1, (6.21) 

Let p' = r d (p, A, v*{B')). Since 6? (50 = for all i > B' + 1, by Eq. (6.21) 
we have 

C7(pA,^(50)>C(pA,^), (6.22) 

where tt* is the optimal stationary policy given in Lemma 3 under any the 
finite buffer size 5 > 5'. We have 

c(p,xy p ) + s 
( >c( p ,\,v;(b')) 

( > } C(p, A,7T*,) 
>C(p,X,7T t ^' j 

( > } C7(p,A,4^- 2 ), (6.23) 

where the inequalities (a) through (d) follow from Eqs. (6.20), (6.22), (6.14), 
and (6.21), respectively. Since Eq. (6.23) holds for all 5 > 0, we have proven 
Eq. (6.15). This completes the proof of Theorem 7. □ 



'Note that, in general, rd (p, A, fp(B')) could be greater than p, for any finite B' . 
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7. Optimal Offline Policies. We prove Theorem 9 in this section, 
which is completed in two parts. In the first part (Section 7.2), we give a full 
characterization of the sample path resulted by applying it nob (Proposition 
1), which turns out to be a recurrent random walk. This allows us to obtain 
the steady-state distribution of the queue length under it nob in closed- 
form. From this, the expected queue length, which is equal to the time- 
average queue length, C (p, A, ttnob), can be easily derived and is shown to 
be x-(i-p) " Several side results we obtain along this path will also be used 
in subsequent sections. 

The second part of the proof (Section 7.3) focuses on showing the heavy- 
traffic optimality of ttnob among the class of all feasible offline policies, 
namely, that lim^! C (p, A, ttnob) = nm A^i (p, A), which, together with 
the first part, proves Theorem 9 (Section 7.4). The optimality result is proved 
using a sample-path-based analysis, by relating the resulting queue length 
sample path of ttnob to that of a greedy deletion rule, which has an opti- 
mal deletion performance over a finite time horizon, {1, . . . , N}, given any 
initial sample path. We then show that the discrepancy between ttnob and 
the greedy policy, in terms of the resulting time-average queue length after 
deletion, diminishes almost surely as N -*■ oo and A -*■ 1 (with the two limits 
taken in this order). This establishes the heavy-traffic optimality of ttnob- 

7.1. Additional Notation. Define Q as the resulting queue length process 
after applying tt NO b 

Q = D(Q°,M*). 

and Q as the shifted version of Q, so that Q starts from the first deletion in 

Q, 

Q[n] = Q[n + mf], neZ + . (7.1) 
We say that B = {I, . . . , u} c N is a busy period of Q, if 

Q[l - 1] = Q[u] = 0, and Q[n] > for all n 6 {I, . . . ,u - 1}. (7.2) 

We may write Bj = {lj, . . . ,Uj} to mean the jth busy period of Q. An 
example of a busy period is illustrated in Figure 3.2. 

Finally, we will refer to the set of slots between two adjacent deletions in 
Q (note the offset of mi), 

E i = \ m i ~ m i , m i + 1 - m x , . . . , m i+1 - l-m x ), (7.3) 
as the ith. deletion epoch. 
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7.2. Performance of the No-Job-Left-Behind Policy. For simplicity of 
notation, throughout this section, we will denote by M = {rrii : i e N} the 
deletion sequence generated by applying it nob to Q°, when there is no am- 
biguity (as opposed to using M* and mf). The following lemma summarizes 
some important properties of Q which will be used repeatedly. 

Lemma 4. Suppose l>A>l-p>0. The following hold with probability 
one. 

1. For all n e N, we have Q[n] - Q°[n + m\\ - I(M, n + mi). 

2. For all i e N, we have n-m-i- mi, if and only if 

Q[n] =Q[n-l] = 0, (7.4) 

with the convention that Q[-l] = 0. In other words, the appearance of 
two consecutive zeros in Q is equivalent to having a deletion on the 
second zero. 

3. Q[n] eZ + for allneZ + . 

Proof. See Appendix A. 2 □ 

The next proposition is the main result of this subsection. It specifies the 
probability law that governs the evolution of Q. 

Proposition 1. {Q[n] ■ n e Z + } is a random walk on Z + , with Q[0] = 0, 
and, for all n e N and xi,X2 



P(Q[n+l] =x 2 | Q[n] = x 2 ) 
if xi > 0, and 



( 1-p 

A 



X 2 - Xi = 1, 
X2~Xl = -1, 



A+l-p' 
0, otherwise 



'(Q[n + 1] = x 2 | Q[n] = xi) = 



A+l-p' 
A 



X 2 -xi = 1, 
x 2 -xi = 0, 



A+l-p' 

0, otherwise 



if xi = 0. 



Proof. For a sequence {AT[^] : n e N} and s,t e N, s < t, we will use the 
short-hand 

Xl = {X[s],...,X[t]}. 
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Fix n e N, and a sequence (q\, . . . ,q n ) cZJ. We have 

P(Q[n]=g[n]|Qr 1 = ?i" 1 ) 
= E E = «["] I QT l = = 4,m k+1 >n + t x ) 



k=l h 

tk<n— 1+ti 



• P (mj = tl m k+l >n + h\ Q^ 1 = q^ 1 ) (7.5) 

Restricting to the values of Vs and g[i]'s under which the summand is 
non-zero, the first factor in the summand can be written as 

P (<2[n] = g[n] | Qr 1 = gr 1 , mf = t\, m k+l >n + h) 

=P (Q[n + 7m] = g[n] | QZ\\\~ X = = , > ™ + *l) 

( =P (Q°[n + h] = q[n] + k | Q°[s + h] = q[s] + /({ti}f =1 , s + *i) , VI < s < n - 1, 

and min Q°[r] > /c] 

r>n+ti / 

( =p(Q [n + ti] = g[n] + fc Q°[n - 1 + ii] = g[n - 1] + k, and min Q°[r]>k), 

\ r>n+ti ) 

(7.6) 



where Q was defined in Eq. (7.1). Step (a) follows from Lemma 4 and the 
fact that t k < n - 1 + ti, and (b) from the Markov property of Q° and 
the fact that the events {min r > n+ t 1 Q°[r] > {Q°[n + t\] = q[n] + /c}, and 
their intersection, depend only on the values of : s > n + ii}, and are 

hence independent of ■ 1 < s < n - 2 + t\} conditional on the value of 

Q°[t l + n-l]. 

Since the process Q lives in Z + (Lemma 4), it suffices to consider the case 
of q[n] = q[n - 1] + 1, and show that 

p(Q°[n + ti] =g[n-l] + l + k Q°[n-l + t 1 ] = q[n-l] + k, 

and min Q [rl>A;| 

r>n+ti I 

l-p 



A+ 1-p 



(7.7) 



for all q[n - 1] e Z + . Since Q[rrii - mi] = Q[rrii - 1 - mi] = for all i (Lemma 
4), the fact that q[n] = q[n - 1] + 1 > implies that 

n < mjt+i — 1 + mi. (7-8) 
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Moreover, since Q [mk+i - 1] = k and n < m^+i - 1 + mi, we have that 

q[n] > implies Q°[i] = k, for some t > n + 1 + m\. (7-9) 

We consider two cases, depending on the value of q[n- 1]. 

Case 1: q[n - 1] > 0. Using the same argument that led to Eq. (7.9), we 
have that 

q[n - 1] > implies = k, for some t >n + mi. (7-10) 

It is important to note that, despite the similarity in conclusions, Eqs. (7.9) 
and (7.10) are different in their assumptions (i.e., q[n] versus q[n- 1]). We 
have 

P (Q°[n + tj] = q[n - 1] + 1 + k Q°[n - 1 + t ± ] = q[n -l]+k, 

and min Q°[rl>A;| 

r>n+ti J 

( =P (Q°[n + t 1 ]=q[n-l] + l + k Q°[n - 1 + h] = q[n -l] + k, 

and min Q°[rl = k) 

r>n+ti ) 

\Q°[2] =g[n-l] + l|Q°[l] =q[n-l], and mm Q°[r] = j 



(6) 

(c) l-p 



X + l-p 



(7.11) 



where (a) follows from Eq. (7.10), (b) from the stationary and space-homogeneity 
of the Markov chain Q°, and (c) from the following well-known property of 
a transient random walk conditional to returning to zero. 

Lemma 5. Let {X[n] : n 6 N} be a random walk on Z + , such that for all 
xi,X2 e Z + and n e N, 



'(X[n+1] =x 2 |X[n] =x 2 ) = 



9, 
1- 

10. 



X2 ~ X\ = 1, 
X2 ~X\ = -1, 

otherwise, 



if x\ > 0, and 



(q, x 2 -xi = l, 
' (X[n + 1] = x 2 | = zi) = j 1 - <y, x 2 - xi = 0, 

0, otherwise, 
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if x\ = 0, where q e (|, l) . Then for all x\,x 2 e Z + and n e N, 



P[X[n + l]=x 2 X[n] = xi, min A[r] = 0) 

\ r>n+l / 

i/ xi > 0, and 

P |X[rj + 1] = x 2 X[n] = xt, min X[r] = J = 

\ r>n+l / 



1 ~<Jf, X 2 -Xl = 1, 

q, x 2 -xi=-l, 
0, otherwise, 



'1-q, x 2 -xi = l, 



x 2 -xi = 0, 
otherwise, 



if xi = 0. In ot/ier words, conditional on the eventual return to and before 
it happens, a transient random walk obeys the same probability law as a 
random walk with the reversed one-step transition probability. 



Proof. See Appendix A. 3. 
Case 2: q[n - 1] = 0. We have 

P (Q°[n + 1{\ = q[n - 1] + 1 + k Q°[n - 1 + 1{] = q[n -l]+k, 

and min Q°[V] > k) 
( =P (Q [n + t{] = l + k, and min Q°[r] = k Q°[n - 1 + ti] = jfe, 

\ r>n+t\ 

and min Q°[r] > k) 

r>n+ti I 

'(q°[2] = 2, and minQ°[r] = 1 Q°[l] = 1, and min Q°[r] > l) 

\ r>2 r>2 / 



□ 



=x, (7.12) 

where (a) follows from Eq. (7.9) (note its difference with Eq. (7.10)), and 
(b) from the stationarity and space- homogeneity of Q , and the assumption 
that k > 1 (Eq. (7.5)). 

Since Eqs. (7.11) and (7.12) hold for all x\,k e Z + and n > mi + 1, by 
Eq. (7.5), we have that 



'(Q[n] = gH|Qr 1 = ?r 1 ) = 



r A7T^ 5 q[n]-q[n-l] = l, 
j^, q[n]-q[n-l] = -l, (7.13) 
0, otherwise, 
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if q[n - 1] > 0, and 

P(Q[n] = g[n] | Q?" 1 = ??-') = 



x, q[n]-q[n-l] = l, 

l-x, q[n]-q[n-l] = 0, (7.14) 

fj, otherwise, 



if q[n - 1] = 0, where x represents the value of the probability in Eq. (7.12). 
Clearly, Q[0] = Q°[m{\ = 0. We next show that x is indeed equal to . , p , 
which will have proven Proposition 1. 

One can in principle obtain the value of x by directly computing the prob- 
ability in line (b) of Eq. (7.12), which can be quite difficult to do. Instead, 
we will use an indirect approach that turns out to be computationally much 
simpler: we will relate x to the rate of deletion of ttnob using renewal the- 
ory, and then solve for x. As a by-product of this approach, we will also 
get a better understanding of an important regenerative structure of knob 
(Eq. (7.20)), which will be useful for the analysis in subsequent sections. 

By Eqs. (7.13) and (7.14), Q is a positive recurrent Markov chain, and 
Q[n] converges to a well defined steady-state distribution, Q[oo], as n -> oo. 
Letting 7Tj = P(Q[oo] =i), it is easy to verify via the balancing equations 
that 



n-p j- 1 



7T t = 7T V - " ■ | — ^ ) , Vt>l, (7.15) 



and since Y,i>o 7rj = 1 , we obtain 

= ( 7 - 16 ) 

1 + X ' A-(l-p) 

Since the chain Q is also irreducible, the limiting fraction of time that Q 
spends in state is therefore equal to ttq: 

in i 

J™ n E 1 iQ[t] ^^^^ T~ —■ ( 7 - 17 ) 

Next, we would like to know many of these visits to state correspond to 
a deletion. Recall the notion of a busy period and deletion epoch, defined in 
Eqs. (7.2) and (7.3), respectively. By Lemma 4, n corresponds to a deletion if 
any only if Q[n] = Q[n-1] = 0. Consider a deletion in slot rrii. If Q[mj+1] = 0, 
then mi + 1 also corresponds to a deletion, i.e., rrtj + 1 = mi+i. If instead 
Q[mj + 1] = 1, which happens with probability x, the fact that Q[mj+i-l] = 
implies that there exists at least one busy period, {I, . . . ,u}, between m,i and 
rrii+i, with I = nii and u < m, + i - 1. At the end of this period, a new busy 
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period starts with probability x, and so on. In summary, a deletion epoch 
Ei consists of the slot m, - mi, plus iVj busy periods, where the N{ are i.i.d, 
with 17 

Nx = Geo(l - x) - 1, (7.18) 

and hence 

Ni 

\Ei\ = l + Y,Bij, (7.19) 
j=l 

where {Bij : i,j e N} are i.i.d random variables, and Bij corresponds to the 
length of the jth busy period in the ith epoch. 

Define W[t] = (Q[t],Q[t + 1]), t e Z+. Since Q is Markov, W[t] is also a 
Markov chain, taking values in Z+. Since a deletion occurs in slot t if and 
only if Q[t] = Q[t - 1] = (Lemma 4), corresponds to excursion times 
between two adjacent visits of W to the the state (0,0), and hence are i.i.d. 
Using the Elementary Renewal Theorem, we have 

lim Ij(M,n) = — a.s., (7.20) 

and by viewing each visit of W to (0,0) as a renewal event and using the 
fact that exactly one deletion occurs within a deletion epoch. Denoting by 
Ri the number of visits to the state within Ei, we have that Ri = 1 + iVj. 
Treating Ri as the reward associated with the renewal interval Ei, we have, 
by the time-average of a renewal reward process (c.f., Theorem 6, Chapter 
3, [3]), that 

E (l^il) 

by treating each visit of Q to (0,0) as a renewal event. From Eqs. (7.20) 
and (7.21), we have 

hm^ 00 l/(M,n) _ 1 
lim n ^ oo iEt n = iI(QM = 0) E(JVx) 

Combing Eqs. (4.1), (7.17) and (7.22), and the fact that E(iVi) = E(Geo(l- 
x )) ~ 1 - tt^ _ 1) we have 



A-(I-P) 
A + 1 -p 



A + l-p 

1 + x ■ 



A-(l-p) 



1-x, (7.23) 



r Geo(p) denotes a geometric random variable with mean i. 
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which yields 

x = ytt^ - ■ (7-24) 
X+l-p 

This completes the proof of Proposition 1. □ 

We summarize some of the key consequences of Proposition 1 below, most 
of which are easy to derive using renewal theory and well-known properties 
of positive-recurrent random walks. 

Proposition 2. Suppose that 1 > A > 1-p > 0, and denote by Q[oo] the 
steady-state distribution ofQ. 

1. For all i e Z+, 

P(Q[oo]=t) = (l-^).(^)\ (7.25) 

2. Almost surely, we have that 

1 n 1—7) 

hm -£Q[*J=E(Q[oo]) = P (7.26) 

n^°° U £~J A - (1-p) 

3. Let Ei = {mf , mf + 1, . . . , to* 1 - 1, mf + l j. Then the \Ei\ are i.i.d, with 

E (|£ii) = 7. iTTTTi — 7 = X A+ ,!" P V (7-27) 

and i/iere exists a, b > suc/i i/ia£ /or a// x e M + 

P(|#i| >x) <a-exp(-6-x). (7.28) 
^. Almost surely, we have that 

1 .»= A - (1 - p) -», (7.29) 



E(|Ei|) A + l-p ' 



as i ->• oo. 



Proof. See Appendix A. 4. 



□ 
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7.3. Optimality of the No-Job-Left-Behind Policy in Heavy Traffic. This 
section is devoted to proving the optimality of ttnob as A -»■ 1, stated in 
the second claim of Theorem 9, which we isolate here in the form of the 
following proposition. 

Proposition 3. Fix p e (0, 1). We have that 

lim C (p, A, tt NO b) = lim (p, A) . 

A— >1 A - >1 

The proof is given at the end of this section, and we do so by showing the 
following: 

1. Pver a finite horizon N and given a fixed number of deletions to be 
made, a greedy deletion rule is optimal in minimizing the post-deletion 
area under Q over {1, . . . , N}. 

2. Any point of deletion chosen by it nob will also be chosen by the greedy 
policy, as N -> oo. 

3. The fraction of points chosen by the greedy policy but not by it nob 
diminishes as A -»■ 1, and hence the delay produced by ttnob is the 
best possible, as A -»■ 1. 

Fix N e N. Let S(Q,N) be the partial sum S(Q,N) = £n=iQ[n]. For 
any sample path Q, denote by A (Q,n) the marginal decrease of area under 
Q over the horizon {1, . . . , N} by applying a deletion at slot n, i.e., 

A P (Q,N,n) = S(Q,N) - S (D P (Q,n) ,N) , 

and, analogously, 

A (Q, N, M') =S(Q,N)-S (D (Q, M') , N) , 

where M' is a deletion sequence. 

We next define the notion of a greedy deletion rule, which constructs a 
deletion sequence by recursively adding the slot that leads to the maximum 
marginal decrease in S(Q,N). 

Definition 12. (Greedy Deletion Rule) Fix an initial sample path 
Q° , and K,N e N. The greedy deletion rule is a mapping, G(Q°,N,K} , 
which outputs a finite deletion sequence M G = \mf : 1 < i < K}, given by 

m? e arg max Ap(Q°,N,m) , 

me*(Q°,Af) V ' 

m? e arg max A P (Q^g, N, m) , 2 < k < K, 
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where <3? (Q, N) = $ (Q) n {1, . . . , N} is the set of all deletable locations in 
Q in the first N slots, and Q k M G = D(Q°,{mf : 1 <i< A;}). Note that we 
will allow m G = oo, if there is no more entry to delete (i.e., 3>(Q fe_1 ) n 
{1,...,N} = 0). 

We now state a key lemma that will be used in proving Theorem 9. It 
shows that over a finite horizon and for a finite number of deletions, the 
greedy deletion rule yields the maximum reduction in the area under the 
sample path. 

Lemma 6. (Dominance of Greedy Policy) Fix an initial sample path 
Q°, horizon N € N, and number of deletions K € N. Let M' be any deletion 
sequence with I(M',N) = K. Then 

S (D (Q°,M') ,N)>S(D (Q°, M G ) , N) , 

where M G = G(Q ,N,K^ is the deletion sequence generated by the greedy 
policy. 

Proof. By Lemma 1, it suffices to show that, for any sample path 
{Q[n] e Z + : n € N} with |Q[n+l]-Q[n]| = 1 if Q[n] > and |Q[n+l]-Q[n]| e 
{0,1} if Q[n] = 0, we have 

S(D(Q,M'),N)>A P (Q,N,m G )+ min S(D(Q 1 mG ,M) ,N) . 

\M\=k-l. 

Mc$(D(Q,mf ),N) 

(7.30) 

By induction, this would imply that we should use the greedy rule at every 
step of deletion up to K. The following lemma states a simple monotonicity 
property. The proof is elementary, and is omitted. 

Lemma 7. (Monotonicity in Deletions) Let Q and Q' be two sample 
paths such that 

Q[n]<Q'[n], Vn e {1, . . . ,N} . 

Then, for any K > 1, 

min S(D(Q,M),N) < min S (D (Q', M) , N) . (7.31) 

\M\=K, \M\=K, ' 

Mc$(Q,iV) Mc$(Q',iV) 

and, for any finite deletion sequence M' c $ (Q, N), 

A (Q, N, M') > A (Q', N, M') . (7.32) 
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Recall the definition of a busy period in Eq. (7.2). Let J(Q, N) be the total 
number of busy periods in {(J[n] ■ 1 <n < N}, with the additional convention 
Q[N + 1] = so that the last busy period always ends on N. Let Bj = 
{lj, . . . ,Uj} be the jth busy period. It can be verified that a deletion in 
location n leads to a decrease in the value of S(Q,N) that is no more than 
the width of the busy period to which n belongs (c.f., Figure 3.2). Therefore, 
by definition, a greedy policy always seeks to delete in each step the first 
arriving job during a longest busy period in the current sample path, and 
hence 

A(Q,N,G(Q,N,1))= max \Bj\. (7.33) 

l<j<J(Q,N) 



Let 



J*(Q, N) - arg max \BA 

l<j<J(Q,N) 



We consider the following cases, depending on whether M' chooses to delete 
any job in the busy periods in J*(Q,N). 

Case 1: M' n ( Uj € j*(Q N ^Bj} t 0. If lj* e M' for some j* e J* , by 
Eq. (7.33), we can set to lj*. Since mf e M' and the order of deletions 
does not impact the final resulting delay (Lemma 1), we have that Eq. (7.30) 
holds, and we are done. Otherwise, choose m* e M' n Bj* for some j* e J* , 
and we have m* > lj*. Let 

Q' = D P (Q,m*) , and Q = Dp (Q,lj*) . 

Since Q [n] > 0, Vn e {lj*, . . . ,Uj* - 1}, we have Q [n] = Q [n] - 1 < Q'[n], 
V?i € {lj* , . . . ,Uj* - 1} , and Q'[n] = Q[n] = Q[n], Vn i {lj* , . . . , Uj* - 1}, 
which implies that 

Q[n]<Q'[n], Vn 6 {1, . . . ,JV} . (7.34) 

Eq. (7.30) holds by combining Eq. (7.34) and Eq. (7.31) in Lemma 7, with 
K = k-1. 

Case 2: M' n { u jej*(Q,N)Bj) = 0. Let m* be any element in M', and 
Q' = D P (Q,m*). Clearly Q[n] > Q'[n] for all n e {1,...,JV}, and by 
Eq. (7.32) in Lemma 7, we have that 18 

A(Q,N,M'\{m*}) > A(D P (Q,m*) ,N,M'\{m*}) . (7.35) 

Since M' n ( u jej*(Q,N)Bj) = 0, we have that 

Ap(D(Q,M'\{m*}),iV,mf) = max \BA > A P (Q, N, m*) . (7.36) 
v v ' ' l<j<J(Q,N) J 



*For finite sets A and B, A\B = {ae A: at B}. 
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Let M = mf u (M'\ {m*}) , we have that 

s(d(q,m),n) 

= S (Q,N) - A (Q, N, M'\ {m*}) - A P (D (Q, M'\ {m*}) , N, mf ) 

< S (Q, AT) - A (.Dp (Q, m*) ,iV, M'\ {m*}) - A P (D (Q, M'\ {m*}) , N, 

( < 5 (Q, AT) - A (Dp (Q, ?n*) , AT, M'\ {m*}) - A P (Q, iV, m*) 
= S(D(Q,M r ),N) , 

where (a) and (6) follow from Eqs. (7.35) and (7.36), respectively, which 
shows that Eq. (7.30) holds (and in this case the inequality there is strict). 
Case 1 and 2 together complete the proof of Lemma 6. 

□ 



We are now ready to prove Proposition 3. 

Proof. (Proposition 3) Lemma 6 shows that, for any fixed number of 
deletions over a finite horizon TV, the greedy deletion policy (Definition 12) 
yields the smallest area under the resulting sample path, Q, over {1, . . . , N}. 
The main idea of proof is to show that the area under Q after applying ttnob 
is asymptotically the same as that of the greedy policy, as N -> oo and A -> 1 
(in this particular order of limits). In some sense, this means that the jobs 
in M* account for almost all of the delays in the system, as A -> 1. The 
following technical lemma is useful. 

Lemma 8. For a finite set S c M, and fceN, define 

sum of the k largest elements in S 



f(S,k) 



\s\ 



Let {Xi : 1 < i < n} be i.i.d random variables taking values in Z + , where 
E(.Xi) < oo. Then for any sequence of random variables {H n : n e N}, with 
H n < an a.s. as n -> oo for some a e (0, 1), we have 

limsup/({X i :l<i<n},^ n )<E(A:i-l(A:i>F x 1 1 (a))), a.s., (7.37) 
where F x \ (y) = min {x e N : P (X\ > x) < y} . 



Proof. See Appendix A. 5. 



□ 
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Fix an initial sample path Q°. We will denote by M* = {mf : i e N} the 
deletion sequence generated by it nob on Q°. Define 

/ (n) = n - max \EA (7.38) 

l<j</(A/*,n) 

where E{ is the ith deletion epoch of M*, defined in Eq. (7.3). Since Q°[ n ] ^ 
Q°[mi] for all i e N, it is easy to check that 

A P (D (Q°, {mf : 1 < j < i - l}) , n, mf ) = n - mf + 1, 

for all i e N. The function 2 was defined so that the first I(M^ ,l(n)) deletions 
made by a greedy rule over the horizon {1, . . . , n} are exactly {1, . . . , l(n)} n 
M*. More formally, we have the following lemma. 

Lemma 9. Fixn e N, and let M G = G (Q° ,n, I (M* ,1 (n))) . Thenmf = 
mf, for alli&{l,...,l(M iB ,l(n))}. 

Fix K e N, and an arbitrary feasible deletion sequence, M, generated by 
a policy in H^. We can write 

l(M,m%) 

=l(M*,l(m%)) + (l(M*,mZ)-l(M*,l(mZ))) 

=l(M*,l(m%)) + (K~l(M*,l(mt))) 

+ (l(M,m%)-l(M*,m%)) 
=l(M*,l(m%)) + h(K), (7.39) 

where 

h(K) = (K-I (M*, I (m|))) + (I (M, m%) - I (M* mf )) . (7.40) 
We have the following characterization of /i. 
Lemma 10. h(K) < x }^ p ) K , as K 

Proof. See Appendix A. 6 □ 
Let 

M G ' n = G(Q°,n,l(M,n)), (7.41) 
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where the greedy deletion map G was defined in Definition 12. By Lemma 
9 and the definition of M G,n , we have that 

M*n{l,...,l(m*)}cM G '<. (7.42) 

Therefore, we can write 

M G, m * = ( M v n j ( m |)}) u (7.43) 

where = M G ' m *-\ (M* n {l, . . . , I (m|)}). Since |M G ' m £| = j(M,m|) 
by definition, by Eq. (7.39), 



M 



A 



= h(K). (7.44) 



We have 

5 (D (Q°, M*) ,mB-S(D (Q°, M) , m|) 

( <5 (2? (Q°, M*) , m|) - S (d (q°, M G,m K ) , m|) 

( ^D(^M*),mf,Mj), (7.45) 

where (a) is based on the dominance of the greedy policy over any finite 
horizon (Lemma 6), and (6) follows from Eq. (7.43). 

Finally, we claim that there exists g{x) : K -> R+, with g(x) -*■ as x -»• 1, 
such that 

A(D(g°,M*),m|,M?) 
limsup — - 5; L <g(\), a.s. (7.46) 

Eqs. (7.45) and (7.46) combined imply that 

S(D(Q°,M*),m%) 

C (p, A, knob ) = hm sup ^ 

A->oo m\ 

S(D(Q°,M),m%) 
<g{\) + limsup ^ , 

A->oo ^A 

S(D(Q ,Af) ,n) 
=g(A)+ limsup V V ^— i, a.s., (7.47) 



which shows that 



C (p,X,tt NO b) <5(A) + inf C(p,A,vr). 

7reiIoo 



4G 
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Since g(X) -* as A -»■ 1, this proves Proposition 3. 

To show Eq. (7.46), denote by Q the sample path after applying it nob, 

Q = D(Q ,M*), 

and by Vi the area under Q within Ei, 

Vi= 2 QH- 

An example of is illustrated as the area of the shaded region in Figure 
3.2. By Proposition 1, Q is a Markov chain and so is the process W[n] = 
(Q[n],Q[n + 1]). By Lemma 4, Ei corresponds to the indices between two 
adjacent returns of the chain W to state (0,0). Since the ith return of a 
Markov chain to a particular state is a stopping time, it can be shown, using 
the strong Markov property of W, that the segments of Q, {Q[n] ■ n e Ei}, 
are mutually independent and identically distributed among different values 
of i. Therefore, the V^'s are i.i.d. Furthermore, 

E(Vi) ( <E(|.Ei| 2 ) ( < oo, (7.48) 

where (a) follows from the fact that |Q[n + l]-Q[n]| < 1 for all n, and hence 
Vi < \Ei\ 2 for any sample path of Q°, and (6) from the exponential tail bound 
on P(|£i| > x), given in Eq. (7.28). 

Since the value of Q on the two ends of Ei, mf and mf +1 - 1, are both 
zero, each additional deletion within Ei cannot produce a marginal decrease 
of area under Q of more than Vi (c.f., Figure 3.2). Therefore, the value of 
A M*) , m^, M K ^j can be no greater than the sum of the h(K) 

largest V^'s over the horizon n € jl, . . . ,rnj^}. We have 

A(D(Q°,M*),m%,M%) 
lim sup J 

K^oo rnfr 

= liTasupf({Vi-.l<i<K},h(K))~ 

K—*-oo 

{ = ] limsu V f({Vi:l<i<K},h(K)) 



K^oo X-(l-q) 
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where (a) follows from Eq. (7.29), and (6) from Lemmas 8 and 10. Since 
E (Vi) < oo, and F Vl (x) -*■ oo as x -*■ 0, it follows that 

as A -» 1. Eq. (7.46) is proved by setting g(X) = E (Vi ■ I (a x > F v ] ( A _ 1 ( ^ p) )))- 
A A |iIg) • This completes the proof of Proposition 3. □ 

7.3.1. Why not use Greedy?. The proof of Proposition 3 relies on a 
sample-path-wise coupling to the performance of a greedy deletion rule. It is 
then only natural to ask: since the time horizon is indeed finite in all prac- 
tical applications, why don't we simply use the greedy rule as the preferred 
offline policy, as opposed to ttnob^ 

There are at least two reasons for focusing on ttnob instead of the greedy 
rule. First, the structure of the greedy rule is highly global, in the sense 
that each deletion decision uses information of the entire sample path over 
the horizon. As a result, the greedy rule tells us little on how to design a 
good policy with a fixed lookahead window (e.g., Theorem 11). In contrast, 
the performance analysis of ttnob in Section 7.2 reveals a highly regenera- 
tive structure: the deletions made by knob essentially depend only on the 
dynamics of Q° in the same deletion epoch (the -EVs), and what happens 
beyond the current epoch becomes irrelevant. This is the key intuition that 
led to our construction of the finite- lookahead policy in Theorem 11. A sec- 
ond (and perhaps minor) reason is that of computational complexity. By a 
small sacrifice in performance, ttnob can be efficiently implemented using a 
linear-time algorithm (Section 4.2.2), while it is easy to see that a naive im- 
plementation of the greedy rule would require super-linear complexity with 
respect to the length of the horizon. 

7.4. Proof of Theorem 9. 

Proof. (Theorem 9) The fact that knob is feasible follows from Eq. (4.1) 
in Lemma 2, i.e. 

limsupl/(M*,n)< A "^"^ < — ? , a .s. 

n^oo n X+l-p X+l-p 

Let {<5[ n ] : n € Z + } be the resulting sample path after applying tinob to 
the initial sample path {Q°[n] : n € Z + }, and let 

Q[n] = Q[n + mf] , Vn e N, 
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where m* is the index of the first deletion made by ttnob- Since A > 1 —p, 
the random walk Q° is transient, and hence mf < oo almost surely. We have 
that, almost surely, 

1 n ~ 

C(p, X,tt NO b) = hm -£0M 



n->oo 77, 



i=l 



i "H in 
= lim - V Q[i] + lim -YQ[i] 

n-»oo 77 |-J n->oo 77, f-J 

1_P (7.50) 



A-(l-p)' 

where the last equality follows from Eq. (7.26) in Proposition 2, and the fact 
that m\ < oo almost surely. Letting A -> 1 in Eq. (7.50) yields the finite limit 
of delay under heavy traffic: 

limC(p, X,tt N ob) = lim 1 P = - — -. 
a^i A^iA-(l-p) p 

Finally, the delay optimality of knob in heavy traffic was proved in Propo- 
sition 3, i.e., that 

lim C (p, A, ttnob) = lim (p, A) . 

A > 1 A - *"1 

This completes the proof of Theorem 9. □ 
8. Policies with a Finite Lookahead. 

8.1. Proof of Theorem 11. 

Proof. (Theorem 11) As pointed out in the discussion preceding The- 
orem 11, for any initial sample path and w < oo, an arrival that is deleted 
under the ttnob policy will also be deleted under tt nob . Therefore, the de- 
lay guarantee for ttnob (Theorem 9) carries over to tt n ^ b , and for the rest 
of the proof, we will be focusing on showing that tt n ^q B is feasible under an 
appropriate scaling of w(X). We begin by stating an exponential tail bound 
on the distribution of the discrete-time predictive window, W(A, n), defined 
in Eq. (3.6), 

W(X,n) = max {A; e Z + : T n+k < T n + w(X)} . 

It is easy to see that {W (A, mf ) : i e N} are i.i.d, with W (A, mf J distributed 
as a Poisson random variable with mean (A + 1 - p)w(X). Since 



•(w(X,mf)>x) 
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where the Xk are i.i.d Poisson random variables with mean A + (1 —p), 
applying the Chernoff bound, we have that, there exist c, d > such that 

P (w (A, mf ) > X + \~ P ■ w(\)\ < c ■ exp(-d • w(X)), (8.1) 

for all w(X) > 0. 

We now analyze the deletion rate resulted by the ^%qb P onc y- F° r the 
pure purpose of analysis (as opposed to practical efficiency), we will consider 
a new deletion policy, denoted by a w ^ , which can be viewed as a relaxation 
of tt w{X) 

Ui NOB' 

Definition 13. Fix w e R + . The deletion policy a w is defined such that 
for each deletion epoch Ei, i e N, 

1. if \Ei\ < W (\,mf), then only the first arrival of this epoch, namely, 
the arrival in slot mf , is deleted; 

2. otherwise, all arrivals within this epoch are deleted. 



It is easy to verify that a w can be implemented with w units of look- 

w(\) 
NOB 



ahead, and the set of deletions made by o~ w W [ s a strict superset of tt™^~~ 



almost surely. Hence, the feasibility of a w ^ will imply that of t^q\- 

Denote by Di the number of deletions made by the epoch Ei. 

By the construction of the policy, the Di are i.i.d, and depend only on the 
length of Ei and the number of arrivals within. We have 19 

E(£>i) 

<l + E[\Ei\-l(\Ei\>W(X,mf))] 



<1 + E 



E i \-l(\E l \> X + 1 2 P -w(X) ) j 
E(\E i \).F{w(X,mf)<^^-- W (X^ 



+ 



<1 + 



V k ■ a ■ exp(-6 • k) 



c ■ exp(-d ■ w(X)) 



A-(l-p) 



(a) 

< l + h-w(X)-exp(~l-w(X)), (8.2) 

for some h, I > 0, where (a) follows from the fact that EfcLn ^ ' exp(-6 • k) = 
O (n • exp(-6 • n)) as n -*■ oo. 



19 For simplicity of notation, we assume that p ■ w(X) is always an integer. This 
does not change the scaling behavior of w(\). 
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Since the Di are i.i.d, using basic renewal theory, it is not difficult to show 
that the average rate of deletion in discrete time under the policy 
equal to . In order for the policy to be feasible, one must have that 

gCgi) E(gi) P , 8 „, 
E(Ei) A ~ X + l-p' [ ' } 

By Eqs. (8.2) and (8.3), we want to ensure that 

— — - >l + h- w(X) • exp(-i • w(X)), 

which yields, after taking the logarithm on both sides, 

[X-(l-p)]-h-w(X) 



ww> - l Mih) + - b iog (! 



1 - p 



(8.4) 



It is not difficult to verify that for all p € (0,1) there exists a constant 
C such that the above inequality holds for all A e (1 - p, 1), by letting 
w(X) = Clog( T ^ x ). This proves the feasibility of a w ( x \ which implies that 

^iVOB * s a ^ so f eas ible. This completes the proof of Theorem 11. □ 

9. Concluding Remarks and Future Work. The main objective of 
this paper is to study the impact of future information on the performance of 
a class of admissions control problems, with a constraint on the time- average 
rate of redirection. Our model is motivated as a study of a dynamic resource 
allocation problem between slow (congestion-prone) and fast (congestion- 
free) processing resources. It could also serve as a simple canonical model 
for analyzing delays in large server farms or cloud clusters with resource 
pooling [17] (Section 5). Our main results show that the availability of fu- 
ture information can dramatically reduce the delay experienced by admitted 
customer: the delay converges to a finite constant even as the traffic load 
approaches the system capacity ("heavy-traffic delay collapse"), if the de- 
cision maker is allowed for a sufficiently large lookahead window (Theorem 
11). 

There are several interesting directions for future exploration. On the 
theoretical end, a main open question is whether a matching lower-bound 
on the amount of future information required to achieve the heavy-traffic 
delay collapse can be proved (Conjecture 1), which, together with the upper 
bound given in Theorem 11, would imply a duality between delay and the 
length of lookahead into the future. 

Second, we believe that our results can be generalized to the cases where 
the arrival and service processes are non-Poisson. We note that the knob 
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policy is indeed feasible for a wide range of non-Poisson arrival and ser- 
vice processes (e.g., renewal processes), as long as they satisfy a form of 
strong law of large number, with appropriate time-average rates (Lemma 2). 
It seems more challenging to generalize results on the optimality of ttnob 
and the performance guarantees. However, it may be possible to establish 
a generalization of the delay optimality result using limiting theorems (e.g., 
diffusion approximations). For instance, with sufficiently well-behaved ar- 
rival and service processes, we expect that one can establish a result similar 
to Proposition 1 by characterizing the resulting queue length process from 
knob as a reflected Brownian motion in R+, in the limit of A -> 1 and p -*■ 0, 
with appropriate scaling. 

There are other issues that need to be addressed if our offline policies (or 
policies with a finite lookahead) are to be applied in practice. A most impor- 
tant question can be the impact of observational noise to performance, since 
in reality the future seen in the lookahead window cannot be expected to 
match the actual realization exactly. We conjecture, based on the analysis of 
knob, that the performance of both ttnob, an d its finite-lookahead version, 
is robust to small noises or perturbations (e.g., if the actual sample path 
is at most e away from the predicted one), while it remains to thoroughly 
verify and quantify the extend of the impact, either empirically or through 
theory. Also, it is unclear what the best practices should be when the looka- 
head window is very small relative to the traffic intensity A (w « log^rj), 
and this regime is not covered by the results in this paper (as illustrated in 
Figure 3.4). 
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APPENDIX A: ADDITIONAL PROOFS 
A.l. Proof of Lemma 2. 



Proof. (Lemma 2) Since A > 1— p, with probability one, there exists 
T < oo such that the continuous-time queue length process without deletion 
satisfies Q°(t) > for all t >T. Therefore, without any deletion, all service 
tokens are matched with some job after time T. By the stack interpretation, 
knob only deletes jobs that would not have been served, and hence does 
not change the original matching of service tokens to jobs. This prove the 
first claim. 

By the first claim, since all subsequent service tokens are matched with a 
job after some time T, there exists some N < oo, such that 

Q[n] = Q[N] + (A[n] - A[N]) - (S[n] - S[N]) - l(M* n) , (A.l) 

for all n > N, where A[n] and S[n] are the cumulative numbers of arrival 
and service tokens by slot n, respectively. The second claim follows by multi- 
plying both sides of Eq. (A.l) by -, and using the fact that lim n ^ 00 ^A[n] = 

x Jl_ p and lim n _>oo = x+i P p a,s '' Qi n ] - ® f° r an n ' ana - < 00 

a.s. □ 
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A. 2. Proof of Lemma 4. 
PROOF. (Lemma 4) 

1. Recall the point-wise deletion map, Dp (Q,n), defined in Definition 2. 
For any initial sample path Q°, let Q 1 = Dp(Q° ,m) for some meN. 
It is easy to see that, for all n > m, C? 1 [^] = Q [n] - 1, if and only if 
Q°[s] > 1 for all s e {m + 1, . . . ,re}. Repeating this argument I(M,n) 
times, we have that 

Q[n] =Q[n + mx] = Q°[n + m{\ -I(M,n + mi) , (A.2) 

if any only if for all k € {1, . . . , I(M, n + mi)}, 



Note that Eq. (A. 3) is implied by (and in fact, equivalent to) the 
definition of the m^'s (Definition 8), namely, that for all k e N, Q°[s] > 
k for all s > + 1. This proves the first claim. 



2. Suppose Q[n] = Q[n-l] = 0. Since P (Q°[t] t Q°[t - 1] | Q°[t - 1] > 0) = 



1 for all t e N (c.f., Eq. (2.1)), at least one deletion occurs on the slots 
{n - 1 + mi,n + mi}. If the deletion occurs on n+mi, we are done. Sup- 
pose a deletion occurs on n—l + mi- Then Q°[n + mi] > Q°[n-l + mi], 
and hence 



which implies that a deletion must also occur on n + mi, for otherwise 
Q[n] = Q[n - 1] + 1 = 1 + 0. This shows that n = mi~ mi for some i e N. 
Now, suppose that n- mi- mi for some i e N. Let 



Since the random walk Q is transient and the magnitude of its step 
size is at most 1, it follows that < oo for all k e N a.s, and that 
mk = nk, Vfc 6 N. We have 



Q [s] > k, for all s e {mfc + 1, . . . ,n + mi}. 



(A.S) 



Q°[n + mi] = Q u [n - 1 + mi] + 1, 



n k = inf {n e N : Q°[n] = k, and Q°[t] >k,Wt>n}. 



(A.4) 



Q[n] 

= ) Q°[n + mi] -I(M,n + mi) 
=Q°[m i ]-I(M,m i ) 




(A.5) 
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where (a) follows from Eq. (A. 2), and (6) from the fact that = m,j. To 
show that = 0, note that since n = im-mx, an arrival must have 

occurred in Q° on slot m.j, and hence Q°[n— 1 + mi ] =Q°[n + mi]-l. 
Therefore, by the definition of mi, 

Q°[t] - Q°[n - 1 + mi] = - Q°[n + mi]) + 1 > 0, Vt > n + mi, 

which implies that n — 1 = m^-i - mi, and hence Q[n - 1] = 0, in light 
of Eq. (A. 5). This proves the claim. 
3. For all n € Z + , we have 

Q[n] =Q [m /(Mjn+mi) - mi] + (Q[n] - Q [m /(Mjn+mi) - mi]) 

{ =Q[n] - Q [mj(M,n+mi) - m i] 

+ mi] - Q° [m 7(M]n+mi) ] 

( = } 0, (A.6) 

where (a) follows from the second claim (c.f., Eq. (A. 5)), (b) from the 
fact that there is no deletion on any slot in {/ (M, n + mi) , . . . , n + mi}, 
and (c) from the fact that n + mi > I (M, n + m\) and Eq. (3.3). 

□ 

A. 3. Proof of Lemma 5. 

Proof. (Lemma 5) Since the random walk X lives in Z + and can take 
jumps of size at most 1, it suffices to verify that 

P [X[n + 1] = x x + 1 X[n] = xx, min X[r] = ) = 1 - q, 

\ r>n+l ) 

for all xx € Z + . We have 

P|X[n+l] =xx + l X[n] = x x , min X[r] = 0) 

\ r>n+l / 

P(x[n + 1] =xi + l,min r > n+ iX[r] =o|x[n] = xi) 
P (min r > n+ i X[r] - | X[n] = xi) 
(a) P(x[n + l] =si + l|x[n] =xi)-P(min r .>„+iX[r] =o|x[n + l] = xx + l) 



P (min r > n+ i X[r] = 1 X[n] = xx) 

(A.7) 

Ai(xi) 
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where 



h(x)=¥lmmX[r]=0 X[l]=x) 



and steps (a) and (6) follow from the Markov property and stationarity of 
X, respectively. The values of {h(x) : x e Z + } satisfy the set of harmonic 
equations 

J q .h(x + l) + (l-q).h(x-l), X>1, 
I q- h(l) + 1 - q, x = 0, 

with the boundary condition 

lim /i(x) = 0. (A.9) 

x-*oo 

Solving Eqs. (A.8) and (A.9), we obtain the unique solution 

for all x e Z + . By Eq. (A. 7), this implies that 

P (x[n + 1] = xx + 1 X[rt] = xi, min X[r] = o) = g • - ~ 

\ r>n+l / 



which proves the claim. □ 



A. 4. Proof of Proposition 2. 



Proof. (Proposition 2) Claim 1 follows from the well-known steady- 
state distribution of a random walk, or equivalently, the fact that Q[oo] has 
the same distribution as the steady-state number of jobs in an M/M/l queue 
with traffic intensity p = -r^. For Claim 2, since Q is an irreducible Markov 
chain that is positive recurrent, it follows that its time-average coincides 
with E(Q[oo]) almost surely. 

The fact that Ei's are i.i.d was shown in the discussion preceding Eq. (7.20) 
in the proof of Proposition 1. The value of E(|i?i|) follows by combining 
Eqs. (4.1) and (7.20). 

Let Bij be the length of the jth busy period (defined in Eq. (7.2)) in 
By definition, B\^\ is distributed as the time till the random walk Q reaches 
state 0, starting from state 1. We have 

P(S ljl >x)<P^|jx j <-lj, 
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where the X/s are i.i.d, with P(Xi = 1) = and P(Jfi = -1) = j^-, 

which, by the Chernoff bound, implies an exponential tail bound for P > x), 
and in particular, 

limG Bl , 1 (0) = l ; (A.10) 
By Eq. (7.19), the moment generating function for \E\\ is given by 
G [El] (e) =E(exp(e-|^i|)) 



A' 



( " ) E(e e )-E(exp(iV 1 .GB 1 , 1 (e))) 
=E(e e )-G 7 v 1 (ln(G Bl , 1 (e))), (A.ll) 

where (a) follows from the fact that {N\} u {-Bij : j e N} are mutually 

independent, and Gn 1 (x) = E (exp (a; • N±)). Since Ni = Geo(l - x) - 1, 
lim^io Gn 1 (x) = 1, and by Eq. (A. 10), we have that lim e j,o G\ El \(e) = 1, which 
implies Eq. (7.28). 

Finally, Eq. (7.29) follows from the third claim and the Elementary Re- 
newal Theorem. □ 

A. 5. Proof of Lemma 8. 

Proof. (Lemma 8) By the definition of F Xl and the strong law of large 
numbers (SLLN), we have 

hm -tl(l J >F x 1 1 (a)) = E(l(l i >^(a)))<a, a.s. (A. 12) 

t->oo 77 

Denote by S n ^ set of top k elements in {Xi : 1 < i < n}. By Eq. (A. 12) and 
the fact that H n < an a.s., there exists N > such that 

F{3N, s.t. mmS n7Hn >FxM),Vn>N} = l, 

which implies that 

limsup/({A"i : 1 < i < n} ,H n ) 

n->oo 

< lim sup — ^Xi-l(Xi> F Xl (a) ) 

^E{x 1 -l(x 1 >F x \(a))) a.s., (A.13) 
where the last equality follows from the SLLN. This proves our claim. □ 
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A. 6. Proof of Lemma 10. 

PROOF. (Lemma 10) 

We begin by stating the following fact: 

Lemma 11. Let {X-i : i e N} be i.i.d random variables taking values in 
M+, such that for some a, b > 0, P (X\ > x) < a-exp(-b-x) for all x > 0. Then 

max Jj = o{n), a.s., 

l<i<n 

as n -> oo . 



Proof. 

/ 2 \ / 2 \ n 

lim P I max Xi < — In n I = lim P I X\ < — In n I 

n->oo \l<i<n b I n-»oo \ b I 

< lim (1 - a • exp(-21nn)) Tl 

n->oo 

= lim 

= 1. (A.14) 

In other words, maxi<j< n Aj < |lnn a.s. as n -*■ oo, which proves the claim. 

□ 

Since the \Ei\'s are i.i.d with E(|Ei|) = xztjz^ (Proposition 2), we have 
that, almost surely, 

m% = K ^\E i \~E(\E 1 \)-K= X + ]~ P -K, as oo, (A.15) 



j=0 



A-(l-p) 



by the strong law of large numbers. By Lemma 11 and Eqs. (7.28), we have 

max \EA = o(K), a.s., (A. 16) 

l<i<K 

as K -*■ oo. By Eq. (A. 16) and the fact that /(M*,m^) = K, we have 
K-l(M*,l(m%))=K-l{M*,m% - max \Ei\ j 
( <K - I (M* m*-) + max \EA 

V ' \<i<K 



= max \Ei 

l<i<K 



--o(K), a.s., (A.17) 
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as K -*■ oo, where (a) follows from the fact that at most one deletion can 
occur in a single slot, and hence I(M, n + m) < I(M, n) + m for all m, n e N. 
Since M is feasible, 

l(M,n) < -n, (A.18) 

A + 1 -p 

as n -> oo. We have, 

/ l (X)=(if-/(M*,Z(m|))) + (l(M,mI)-l(M*,mI)) 

?(if-j(Af»z( m *))) + I -^. TO *-jr 

(6)/ P A + l-p u 



I p 

^ A + 1 



P A-(l-p) 
A, a.s., 



A-(l-p) 



as K -> co, where (a) follows from Eqs. (A. 15) and (A.18), (6) from Eqs. (A. 15) 
and (A. 17), which completes the proof. □ 
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